Voltage Lockdown: Investigating AMD's Recent AM5 AGESA Updates on ASRock's X670E Taichi
by Gavin Bonshor on May 16, 2023 12:00 PM EST- Posted in
- CPUs
- AMD
- ASRock
- Motherboards
- X3D
- Ryzen 7000
- X670E Taichi
- AGESA
It's safe to say that the last couple of weeks have been a bit chaotic for AMD and its motherboard partners. Unfortunately, it's been even more chaotic for some users with AMD's Ryzen 7000X3D processors. There have been several reports of Ryzen 7000 processors burning up in motherboards, and in some cases, burning out the chip socket itself and taking the motherboard with it.
Over the past few weeks, we've covered the issue as it's unfolded, with AMD releasing two official statements and motherboard vendors scrambling to ensure their users have been updating firmware in what feels like a grab-it-quick fire sale, pun very much intended. Not everything has been going according to plan, with AMD having released two new AGESA firmware updates through its motherboard partners to try and address the issues within a week.
The first firmware update made available to vendors, AGESA 1.0.0.6, addressed reports of SoC voltages being too high. This AGESA version put restrictions in place to limit that voltage to 1.30 V, and was quickly distributed to all of AMD's partners. More recently, motherboard vendors have pushed out even newer BIOSes which include AMD's AGESA 1.0.0.7 (BETA) update. With even more safety-related changes made under the hood, this is the firmware update AMD and their motherboard partners are pushing consumers to install to alleviate the issues – and prevent new ones from occurring.
In this article, we'll be taking a look at the effects of all three sets of firmware (AGESA 1.0.0.5c - 7) running on our ASRock X670E Taichi motherboard. The goal is to uncover what, if any, changes there are to variables using the AMD Ryzen 9 7950X3D, including SoC voltages and current drawn under intensive memory based workloads.
Here is our recent coverage of the Ryzen 7000X3D/7000 'burnout' issues, including two statements from AMD, an official response from ASUS and MSI:
- MSI Addresses CPU Voltages on AM5 Motherboards for Ryzen 7000X3D Processors
- AMD Issues Official Statement on Reported Ryzen 7000 Burnout Issues
- ASUS Issues Statement on Ryzen 7000X3D Processor Issues, Possible Voltage Issues with AMD EXPO
- AMD Issues Second Statement on Ryzen 7000 Burnout Issues: Caps SoC Voltages
AMD Ryzen 7000 AGESA Firmware: From 1.0.0.5c to 1.0.0.7 Within 32 Days
The first firmware update made available to vendors, AGESA 1.0.0.6, addressed reports of SoC voltages being too high, with new restrictions put in place to limit things to 1.30 V. In the case of the board we've been using to try and dig deeper into issues, the ASRock X670E Taichi, this was made available to the public on 4/27/23 through its 1.21 firmware update. More recently, on 5/4/23, ASRock made its latest 1.24.AS02 firmware, which includes AMD's AGESA 1.0.0.7 (BETA) update.
The AGESA 1.0.0.7 (BETA) update is the firmware that AMD has been planning to roll out most recently to alleviate the issues of burnout, not just for Ryzen 7000X3D chips with 3D V-Cache, but also across the broader Ryzen 7000 and AM5 ecosystem. In between the initial AGESA 1.0.0.5c firmware that brought AMD's Ryzen 7000X3D support to AM5 motherboards, in the space of a mere 32 days, AMD has released a total of three major AGESA versions, which ASRock has dutifully published for the X670E Taichi. We'll be using this as our baseline for our analysis and look into what's going on.
On top of this, AMD is also planning to release an even more robustly updated AGESA firmware, which could be in the coming weeks. Referred to internally as AGESA 1.0.0.9, we did reach out to AMD for comment on this, but our rep couldn't comment on "unannounced or internal only software stacks." It should also be noted that the current firmware at the time of writing available to users is a BETA version, implying that a newer AGESA is undoubtedly on its way. Still, the timescale of the release is anyone's guess currently.
So looking at the variations in AMD's AGESA updates over the last month, there hasn't been any official indication of changes other than the bare minimum, at least not from ASRock's descriptions. The following is what ASRock is saying about the descriptions of the AGESA updates:
- AGESA 1.0.0.5c: Initial support for Ryzen 7000X3D processors with 3D V-Cache.
- AGESA 1.0.0.6 (BETA): Improved memory compatibility, Optimizations for Ryzen 7000X3D, recommended update for Ryzen 7000X3D processors.
- AGESA 1.0.0.7 (BETA): Support for 48/24GB DDR5 memory modules.
The description of the changes, at least from the point of ascertaining what each AGESA is offering, is borderline pitiful. In none of the descriptions does it state what changes AMD has made to each AGESA firmware to address the current issues, which in all honesty, is a pretty big thing to omit. There are no indications whatsoever on ASRock's X670E Taichi BIOS page as to what each firmware changes, and with no public notes available to users, it's a case of "update to this firmware, it's recommended."
So what do we know about the changes? Well, we know the critical change going from AGESA 1.0.0.5c to the 0.6 and 0.7 versions is a lockdown on SoC voltage to 1.30 V. Previously, on the ASRock X670E Taichi with 1.0.0.5c; we were able to set the SoC voltage to 2.5 V, which would almost certainly result in frying our X3D chips like an egg.
Image Credit: Igor Wallossek, Igorslab.de
The other changes coming with AGESA 1.0.0.6, according to Igor Wallosek, the Editor-in-chief of Igorslab.de, AMD has also added two new PROCHOT entries that point directly to combating overheating. PROCHOT essentially means Processor Hot, and it is a controlled mechanism that is designed to protect the processor from overheating. There are two implementations here. The first is the PROCHOT Control mechanism which is precisely what it says on the tin. When the CPU hits a defined value, the component sends a PROCHOT Control signal, and the CPU draws less power to try and mitigate temperatures and reduce the risk of damage.
The second mechanism is PROCHOT Deassertion Ramp Time, which dictates how long a processor can ramp up the power after the initial PROCHOT Control signal has been disabled. Essentially, PROCHOT Deassertion Ramp is the time it takes for the processor to get back up to normal parameters, and different variables, including cooling, the aggressiveness of said cooling, and general heat dissipation quality, can dictate this time. If the processor is inadequately cooled, this can result in a longer deassertion ramp time, whereas more aggressive heat dissipation methods should theoretically allow for a quicker ramp-up time.
The Story So Far: Gamers Nexus Deep-Dive - The Ryzen 7000 CORE Fundamental Issues
Before the rollout of new firmware, Steve Burke, the Editor-in-Chief of Gamers Nexus, and his team investigated the issues in-depth, including looking at the original fried hardware from Speedrookie. This includes a faulty and bulged out Ryzen 7 7800X3D processor and his burnt ASUS ROG STRIX X670E E Gaming motherboard. Instead of RMA'ing the hardware, Steve Burke reached out to the user and offered to buy the hardware from him, minimizing the RMA lead time and allowing Speedrookie to purchase new hardware.
The 38:46 long video is a very good watch, and we certainly recommend that users watch this, especially for those more interested in the inner workings (or issues) of the Ryzen 7000X3D and 7000 series processors. To summarize Steve's findings, we took away the following points:
- AMD Ryzen 7000X3D CPUs are shutting down too late to mitigate physical damage.
- ASRock, GIGABYTE, and MSI have a 116°C thermal trip point, and ASUS has 106°C, but sometimes didn't work as intended.
- The thermal cut-off for Ryzen 7000X3D is supposed to be 106°C and 116°C for Ryzen 7000.
- AMD EXPO enabled on ASUS is 1.35V on SoC voltage up until BIOS 1202 (AGESA 1.0.0.6).
- ASUS's SoC Voltage settings were/are too high.
- The AGESA firmware rollout has been nothing short of chaos at this point.
- AMD is offering RMA (paying shipping both ways) on killed CPUs, even if EXPO has been used (at least in the US)
- No word on if motherboard vendors will honor the warranty (at the time of writing)
While Steve and his team at Gamers Nexus have gone deep into uncovering the root causes of the problem, one thing remains abundantly clear: the issue is not just one that relates to SoC voltage. There has certainly been some confusion between AMD themselves and its motherboard partners in implementing the appropriate failsafe to prevent the CPU (and motherboard socket, for that matter) from burning into oblivion.
The other problem relates to ASUS here, with a more aggressive implementation of its SoC voltages, which Gamers Nexus confirmed in their testing as running too high. Before the AGESA firmware (1.0.0.6) update through BIOS version 1202, ASUS was overshooting SoC voltage by 0.05 V over AMD's newly imposed SoC voltage limit of 1.3 V.
Image Credit: Gamers Nexus
Soldering leads and connecting the motherboard to a digital multimeter, a 1.35 V SoC setting within the ASUS firmware (and with EXPO enabled) resulted in an observed 1.398 V from an SoC pad. This was typically even higher when probed at the choke, at an eye-watering 1.42 V. This fundamentally poses a problem that ASUS's firmware and the SoC rails themselves aren't cohabiting well with each other. An additional 0.05 V on top of the recommended 1.30 V is a lot, to say the least, but adding an extra 0.05 V on top of that can undoubtedly lead to dielectric degradation and possibly lead to dead CPUs and burnt motherboard sockets.
Doing some preliminary testing on the effect of SoC voltage on stability on the latest AGESA 1.0.0.7 (BETA) firmware, our G.Skill DDR5-6000 kit of DDR5 memory (2 x 16 GB) on the ASRock X670E Taichi would automatically preset 1.30 V on the SoC when applying the EXPO memory profile. To elaborate, unfortunately, we tried 1.15 V, which was a no-go, and even 1.20 V was a no-go. We eventually settled on 1.25 V on the SoC for this kit and our Ryzen 9 7950X3D, and we found stability in memory-intensive benchmarks was solid.
Perhaps one of the biggest things to come outside of Gamers Nexus's testing was that AMD is now offering RMA support for users who have used EXPO memory profiles, something which normally voids the warranty on AMD's processors. Whether or not other regions intend to honor these RMA requests hasn't been confirmed, but it's unlikely to be an issue.
Still, it's a good gesture for users with damaged CPUs from an issue that is entirely not their fault. Motherboard vendors, on the other hand, operate within their policies and parameters, and it may be trickier getting an RMA on a damaged motherboard simply because AMD doesn't control motherboard vendors' RMA policies. We would hope in good faith that motherboard vendors will honor the warranty in instances of these burnout issues, but we cannot confirm if they will at this time.
Our Testing: Methodology, Test Setup, and Hardware
To summarize the reason for testing AMD's AGESA firmware, we aren't trying to replicate burning our Ryzen 7000X3D samples – enough processors have already been sacrificed for science. For that matter, we certainly didn't see or smell any smoke coming from our ASRock X670E Taichi during testing, so we'll take that as a good sign.
Our purpose for testing is to highlight any differences or variations in parameters and power-related elements coming from AMD's latest AGESA packages. This includes looking at rails like SoC voltage and Package Power Tracking (PPT) output from the AM5 CPU socket. As AMD has dialed down what users and motherboard vendors can apply in regards to SoC voltage to 1.30 V, it's worth noting that all of ASRock's firmware we've tested on the X670E Taichi in this piece automatically sets SoC voltage to 1.30 V. While we don't have the necessary tools and equipment to solder leads to the motherboard to observe 'physical' voltages, we are relying on HWInfo's reporting prowess, as well as looking at multiple temperatures.
We also did some in-house stability testing against the new SoC voltage limits, running a fresh batch of tests on our Ryzen 9 7950X3D paired with a G.Skill DDR5-6000 (2 x 16 GB) memory kit with its AMD EXPO memory profile enabled. We found that things weren't stable until we applied 1.25 V on the SoC voltage within the firmware. Hitting up to 1.25V on the SoC, our kit was rock solid, even in memory-intensive workloads and benchmarks.
That has been our focus, trying to push the memory as hard as we can to ensure complete stability. A lot of the fanfare surrounding the issue, on the whole, has been unfairly put on AMD's EXPO profiles as being one of the causes; it is not. We know that CPU-intensive workloads will generate more heat, but that isn't what we've been looking at investigating. We're looking for variations in current and power between the different firmware versions to see if AMD (and ASRock) has made optimizations within its framework to reduce these factors, with current, or more specifically over current and the integrated failsafes being bypassed, which is one of the key concerns in the burnouts.
Our test bench for our AGESA (AM5) update testing is as follows:
AMD Ryzen 7950X3D AGESA Test Platform | |
CPU | Ryzen 9 7950X3D ($699) 16 Cores, 32 Threads 120 W TDP |
Motherboard | ASRock X670E Taichi (BIOS 1.18, 1.21 & 1.24.AS02) |
Memory | G.Skill Trident Z5 Neo 2x16 GB DDR5-5200 (JEDEC Default) DDR5-6000 CL34 (EXPO Profile) |
Cooling | EK-AIO Elite 360 D-RGB 360 mm AIO |
Storage | SK Hynix 2TB Platinum P41 PCIe 4.0 x4 NMve |
Power Supply | Corsair HX1000 |
GPUs | AMD Radeon RX 6950 XT, Driver 31.0.12019 |
Operating Systems | Windows 11 22H2 |
For our choice of workloads, we're relying on the Memory Test Suite from Openbenchmarking.org via Phoronix to implement our memory-intensive workloads. Although some of these workloads aren't optimized and don't run on Windows, we used the CacheBench benchmark, which uses multiple data types across read, write, modify, and read/write/modify combined. As part of the LLCbench low-level architectural characterization benchmark suite, CacheBench is designed to test memory and cache bandwidth performance and relies on a compilation of C++ Toolchains and compilers.
Read on for more analysis.
39 Comments
View All Comments
haplo602 - Thursday, May 18, 2023 - link
but they brag about the sunk cost and beg for shop purchases throughout the whole video ...they could have at least tried a regular 7800 in the same mobo and compare the voltage readings to have at least something relevant ...
dan121loveu - Wednesday, May 17, 2023 - link
Gamernexus premise on poor SOC overvolt is wrong. You have to question if their other findings are reliable. Unofficial Asus video on die sense and socket sense, not a new thing, they had one few years back in the same channel. They have even put this features in their ROG X670E homepage. Auto-translate to english video here https://www.youtube.com/watch?v=l8r4LVV_jsQSilver5urfer - Wednesday, May 17, 2023 - link
Very interesting and a good video.Major takeaway point are - Die Sense is enabled on default for the C8H and other premium ASUS boards, and GN has same board, and that means HWInfo should actually show proper data. This is like those premium Intel boards which have a switch that allows directly reading from Die Sense on the fly (HWinfo helps to spit out that data). Only diff here it's already default you get 2 measurement points to compare it to !
So GN fked up by using the farthest point, and they did not even check or bothered to check the HWinfo reading when they are doing their big space age big brain investigation and throw a bunch of terms at the audience to confuse massively. 1M views already are not free so the sensationalism has to get the maximum coverage skipping all the points in the middle and it works, always did because avg consumer is a dumb rock.
I presume AT's X670 Taichi is also similar design, so the VSoC reading is accurate (guessing). Now if you go to Igor's Lab they have a Gigabyte Aorus X670E which is also using Mobo read points, they make the similar mistake like GN using board readouts farthest ones to measure the Voltages which gives them again wrong picture how GB is also shoving 0.03-0.05 volts more despite the new AGESA, and they are ignoring the HWinfo readings as I see only one measurement result from them too. Why not just double check the HWInfo readings instead go for the singular measurement point ?
Top notch journalism nowadays lmao..
haplo602 - Thursday, May 18, 2023 - link
so you boot up the system into BIOS/UEFI, change the settings you want to test and then it reboots and fries the CPU right away ... HOW do you get anything from HWinfo there when you did not even make it to Windows with a functional CPU ? but I am sure you would figure out a way genius ...Silver5urfer - Thursday, May 18, 2023 - link
You completely seem to miss the point. Let alone understand this. I'm talking about the behavior while you are talking about a scenario of all CPUs are dying and so boo hoo I cannot get a read out. If I had an Intel board I'd knew it because I as I alr mentioned I know Die Sense on Apex exists directly.Techie2 - Wednesday, May 17, 2023 - link
The key takeaway for me is that the majority of mobo makers caused the burnouts by automatically bumping the SoC voltages too high when EXPO is enabled. It does not surprise me at all that Asus had excessive voltage. IME they always push the envelope to get minutely better performance numbers and great reviews. It does not surprise me that Asrock used a proper mobo design. They have been doing this for many years IME.dicobalt - Thursday, May 18, 2023 - link
This reminds me when Intel released Core was first released and the memory controller would get easily fried. I was one of the fryers.biostud - Friday, May 19, 2023 - link
I'm using the 1.21 BIOS with 1.0.0.6 AGESA in my ASRock X670E PRO RS, it only applies 1.25V voltage for vSoc on my 7800X3D.GreenReaper - Saturday, May 27, 2023 - link
And that's likely all you need. It's both the minimum and maximum for me - I wasn't able to go beyond 1.25V (which incidentally measured as 1.272V...) without running into issues, while going below it showed computation errors in y-Cruncher's HNT test - a great tool for diagnosing Infinity Fabric instability, which also applies to BOINC tasks.