Voltage Lockdown: Investigating AMD's Recent AM5 AGESA Updates on ASRock's X670E Taichi
by Gavin Bonshor on May 16, 2023 12:00 PM EST- Posted in
- CPUs
- AMD
- ASRock
- Motherboards
- X3D
- Ryzen 7000
- X670E Taichi
- AGESA
It's safe to say that the last couple of weeks have been a bit chaotic for AMD and its motherboard partners. Unfortunately, it's been even more chaotic for some users with AMD's Ryzen 7000X3D processors. There have been several reports of Ryzen 7000 processors burning up in motherboards, and in some cases, burning out the chip socket itself and taking the motherboard with it.
Over the past few weeks, we've covered the issue as it's unfolded, with AMD releasing two official statements and motherboard vendors scrambling to ensure their users have been updating firmware in what feels like a grab-it-quick fire sale, pun very much intended. Not everything has been going according to plan, with AMD having released two new AGESA firmware updates through its motherboard partners to try and address the issues within a week.
The first firmware update made available to vendors, AGESA 1.0.0.6, addressed reports of SoC voltages being too high. This AGESA version put restrictions in place to limit that voltage to 1.30 V, and was quickly distributed to all of AMD's partners. More recently, motherboard vendors have pushed out even newer BIOSes which include AMD's AGESA 1.0.0.7 (BETA) update. With even more safety-related changes made under the hood, this is the firmware update AMD and their motherboard partners are pushing consumers to install to alleviate the issues – and prevent new ones from occurring.
In this article, we'll be taking a look at the effects of all three sets of firmware (AGESA 1.0.0.5c - 7) running on our ASRock X670E Taichi motherboard. The goal is to uncover what, if any, changes there are to variables using the AMD Ryzen 9 7950X3D, including SoC voltages and current drawn under intensive memory based workloads.
Here is our recent coverage of the Ryzen 7000X3D/7000 'burnout' issues, including two statements from AMD, an official response from ASUS and MSI:
- MSI Addresses CPU Voltages on AM5 Motherboards for Ryzen 7000X3D Processors
- AMD Issues Official Statement on Reported Ryzen 7000 Burnout Issues
- ASUS Issues Statement on Ryzen 7000X3D Processor Issues, Possible Voltage Issues with AMD EXPO
- AMD Issues Second Statement on Ryzen 7000 Burnout Issues: Caps SoC Voltages
AMD Ryzen 7000 AGESA Firmware: From 1.0.0.5c to 1.0.0.7 Within 32 Days
The first firmware update made available to vendors, AGESA 1.0.0.6, addressed reports of SoC voltages being too high, with new restrictions put in place to limit things to 1.30 V. In the case of the board we've been using to try and dig deeper into issues, the ASRock X670E Taichi, this was made available to the public on 4/27/23 through its 1.21 firmware update. More recently, on 5/4/23, ASRock made its latest 1.24.AS02 firmware, which includes AMD's AGESA 1.0.0.7 (BETA) update.
The AGESA 1.0.0.7 (BETA) update is the firmware that AMD has been planning to roll out most recently to alleviate the issues of burnout, not just for Ryzen 7000X3D chips with 3D V-Cache, but also across the broader Ryzen 7000 and AM5 ecosystem. In between the initial AGESA 1.0.0.5c firmware that brought AMD's Ryzen 7000X3D support to AM5 motherboards, in the space of a mere 32 days, AMD has released a total of three major AGESA versions, which ASRock has dutifully published for the X670E Taichi. We'll be using this as our baseline for our analysis and look into what's going on.
On top of this, AMD is also planning to release an even more robustly updated AGESA firmware, which could be in the coming weeks. Referred to internally as AGESA 1.0.0.9, we did reach out to AMD for comment on this, but our rep couldn't comment on "unannounced or internal only software stacks." It should also be noted that the current firmware at the time of writing available to users is a BETA version, implying that a newer AGESA is undoubtedly on its way. Still, the timescale of the release is anyone's guess currently.
So looking at the variations in AMD's AGESA updates over the last month, there hasn't been any official indication of changes other than the bare minimum, at least not from ASRock's descriptions. The following is what ASRock is saying about the descriptions of the AGESA updates:
- AGESA 1.0.0.5c: Initial support for Ryzen 7000X3D processors with 3D V-Cache.
- AGESA 1.0.0.6 (BETA): Improved memory compatibility, Optimizations for Ryzen 7000X3D, recommended update for Ryzen 7000X3D processors.
- AGESA 1.0.0.7 (BETA): Support for 48/24GB DDR5 memory modules.
The description of the changes, at least from the point of ascertaining what each AGESA is offering, is borderline pitiful. In none of the descriptions does it state what changes AMD has made to each AGESA firmware to address the current issues, which in all honesty, is a pretty big thing to omit. There are no indications whatsoever on ASRock's X670E Taichi BIOS page as to what each firmware changes, and with no public notes available to users, it's a case of "update to this firmware, it's recommended."
So what do we know about the changes? Well, we know the critical change going from AGESA 1.0.0.5c to the 0.6 and 0.7 versions is a lockdown on SoC voltage to 1.30 V. Previously, on the ASRock X670E Taichi with 1.0.0.5c; we were able to set the SoC voltage to 2.5 V, which would almost certainly result in frying our X3D chips like an egg.
Image Credit: Igor Wallossek, Igorslab.de
The other changes coming with AGESA 1.0.0.6, according to Igor Wallosek, the Editor-in-chief of Igorslab.de, AMD has also added two new PROCHOT entries that point directly to combating overheating. PROCHOT essentially means Processor Hot, and it is a controlled mechanism that is designed to protect the processor from overheating. There are two implementations here. The first is the PROCHOT Control mechanism which is precisely what it says on the tin. When the CPU hits a defined value, the component sends a PROCHOT Control signal, and the CPU draws less power to try and mitigate temperatures and reduce the risk of damage.
The second mechanism is PROCHOT Deassertion Ramp Time, which dictates how long a processor can ramp up the power after the initial PROCHOT Control signal has been disabled. Essentially, PROCHOT Deassertion Ramp is the time it takes for the processor to get back up to normal parameters, and different variables, including cooling, the aggressiveness of said cooling, and general heat dissipation quality, can dictate this time. If the processor is inadequately cooled, this can result in a longer deassertion ramp time, whereas more aggressive heat dissipation methods should theoretically allow for a quicker ramp-up time.
The Story So Far: Gamers Nexus Deep-Dive - The Ryzen 7000 CORE Fundamental Issues
Before the rollout of new firmware, Steve Burke, the Editor-in-Chief of Gamers Nexus, and his team investigated the issues in-depth, including looking at the original fried hardware from Speedrookie. This includes a faulty and bulged out Ryzen 7 7800X3D processor and his burnt ASUS ROG STRIX X670E E Gaming motherboard. Instead of RMA'ing the hardware, Steve Burke reached out to the user and offered to buy the hardware from him, minimizing the RMA lead time and allowing Speedrookie to purchase new hardware.
The 38:46 long video is a very good watch, and we certainly recommend that users watch this, especially for those more interested in the inner workings (or issues) of the Ryzen 7000X3D and 7000 series processors. To summarize Steve's findings, we took away the following points:
- AMD Ryzen 7000X3D CPUs are shutting down too late to mitigate physical damage.
- ASRock, GIGABYTE, and MSI have a 116°C thermal trip point, and ASUS has 106°C, but sometimes didn't work as intended.
- The thermal cut-off for Ryzen 7000X3D is supposed to be 106°C and 116°C for Ryzen 7000.
- AMD EXPO enabled on ASUS is 1.35V on SoC voltage up until BIOS 1202 (AGESA 1.0.0.6).
- ASUS's SoC Voltage settings were/are too high.
- The AGESA firmware rollout has been nothing short of chaos at this point.
- AMD is offering RMA (paying shipping both ways) on killed CPUs, even if EXPO has been used (at least in the US)
- No word on if motherboard vendors will honor the warranty (at the time of writing)
While Steve and his team at Gamers Nexus have gone deep into uncovering the root causes of the problem, one thing remains abundantly clear: the issue is not just one that relates to SoC voltage. There has certainly been some confusion between AMD themselves and its motherboard partners in implementing the appropriate failsafe to prevent the CPU (and motherboard socket, for that matter) from burning into oblivion.
The other problem relates to ASUS here, with a more aggressive implementation of its SoC voltages, which Gamers Nexus confirmed in their testing as running too high. Before the AGESA firmware (1.0.0.6) update through BIOS version 1202, ASUS was overshooting SoC voltage by 0.05 V over AMD's newly imposed SoC voltage limit of 1.3 V.
Image Credit: Gamers Nexus
Soldering leads and connecting the motherboard to a digital multimeter, a 1.35 V SoC setting within the ASUS firmware (and with EXPO enabled) resulted in an observed 1.398 V from an SoC pad. This was typically even higher when probed at the choke, at an eye-watering 1.42 V. This fundamentally poses a problem that ASUS's firmware and the SoC rails themselves aren't cohabiting well with each other. An additional 0.05 V on top of the recommended 1.30 V is a lot, to say the least, but adding an extra 0.05 V on top of that can undoubtedly lead to dielectric degradation and possibly lead to dead CPUs and burnt motherboard sockets.
Doing some preliminary testing on the effect of SoC voltage on stability on the latest AGESA 1.0.0.7 (BETA) firmware, our G.Skill DDR5-6000 kit of DDR5 memory (2 x 16 GB) on the ASRock X670E Taichi would automatically preset 1.30 V on the SoC when applying the EXPO memory profile. To elaborate, unfortunately, we tried 1.15 V, which was a no-go, and even 1.20 V was a no-go. We eventually settled on 1.25 V on the SoC for this kit and our Ryzen 9 7950X3D, and we found stability in memory-intensive benchmarks was solid.
Perhaps one of the biggest things to come outside of Gamers Nexus's testing was that AMD is now offering RMA support for users who have used EXPO memory profiles, something which normally voids the warranty on AMD's processors. Whether or not other regions intend to honor these RMA requests hasn't been confirmed, but it's unlikely to be an issue.
Still, it's a good gesture for users with damaged CPUs from an issue that is entirely not their fault. Motherboard vendors, on the other hand, operate within their policies and parameters, and it may be trickier getting an RMA on a damaged motherboard simply because AMD doesn't control motherboard vendors' RMA policies. We would hope in good faith that motherboard vendors will honor the warranty in instances of these burnout issues, but we cannot confirm if they will at this time.
Our Testing: Methodology, Test Setup, and Hardware
To summarize the reason for testing AMD's AGESA firmware, we aren't trying to replicate burning our Ryzen 7000X3D samples – enough processors have already been sacrificed for science. For that matter, we certainly didn't see or smell any smoke coming from our ASRock X670E Taichi during testing, so we'll take that as a good sign.
Our purpose for testing is to highlight any differences or variations in parameters and power-related elements coming from AMD's latest AGESA packages. This includes looking at rails like SoC voltage and Package Power Tracking (PPT) output from the AM5 CPU socket. As AMD has dialed down what users and motherboard vendors can apply in regards to SoC voltage to 1.30 V, it's worth noting that all of ASRock's firmware we've tested on the X670E Taichi in this piece automatically sets SoC voltage to 1.30 V. While we don't have the necessary tools and equipment to solder leads to the motherboard to observe 'physical' voltages, we are relying on HWInfo's reporting prowess, as well as looking at multiple temperatures.
We also did some in-house stability testing against the new SoC voltage limits, running a fresh batch of tests on our Ryzen 9 7950X3D paired with a G.Skill DDR5-6000 (2 x 16 GB) memory kit with its AMD EXPO memory profile enabled. We found that things weren't stable until we applied 1.25 V on the SoC voltage within the firmware. Hitting up to 1.25V on the SoC, our kit was rock solid, even in memory-intensive workloads and benchmarks.
That has been our focus, trying to push the memory as hard as we can to ensure complete stability. A lot of the fanfare surrounding the issue, on the whole, has been unfairly put on AMD's EXPO profiles as being one of the causes; it is not. We know that CPU-intensive workloads will generate more heat, but that isn't what we've been looking at investigating. We're looking for variations in current and power between the different firmware versions to see if AMD (and ASRock) has made optimizations within its framework to reduce these factors, with current, or more specifically over current and the integrated failsafes being bypassed, which is one of the key concerns in the burnouts.
Our test bench for our AGESA (AM5) update testing is as follows:
AMD Ryzen 7950X3D AGESA Test Platform | |
CPU | Ryzen 9 7950X3D ($699) 16 Cores, 32 Threads 120 W TDP |
Motherboard | ASRock X670E Taichi (BIOS 1.18, 1.21 & 1.24.AS02) |
Memory | G.Skill Trident Z5 Neo 2x16 GB DDR5-5200 (JEDEC Default) DDR5-6000 CL34 (EXPO Profile) |
Cooling | EK-AIO Elite 360 D-RGB 360 mm AIO |
Storage | SK Hynix 2TB Platinum P41 PCIe 4.0 x4 NMve |
Power Supply | Corsair HX1000 |
GPUs | AMD Radeon RX 6950 XT, Driver 31.0.12019 |
Operating Systems | Windows 11 22H2 |
For our choice of workloads, we're relying on the Memory Test Suite from Openbenchmarking.org via Phoronix to implement our memory-intensive workloads. Although some of these workloads aren't optimized and don't run on Windows, we used the CacheBench benchmark, which uses multiple data types across read, write, modify, and read/write/modify combined. As part of the LLCbench low-level architectural characterization benchmark suite, CacheBench is designed to test memory and cache bandwidth performance and relies on a compilation of C++ Toolchains and compilers.
Read on for more analysis.
39 Comments
View All Comments
Golgatha777 - Tuesday, May 16, 2023 - link
I'm on a B650E-F with a 7700X CPU. With all that's gone on here with the X3D parts, I think I'll give it awhile to sort things out before I think about upgrading to the 7800X3D. I upgraded to 1406 when I first got my motherboard, and I believe was the first one with X3D support (until ASUS started daily edits of their BIOS and CPU support lists anyway). I have a very stable system, so I plan to sit on the sidelines and not upgrade my BIOS until there's a non-beta one that's been listed for at least a couple of months.GreenReaper - Saturday, May 27, 2023 - link
I think if it was actually damaged it would have likely shown up in improper working. Most of the damage seems to have been people manually increasing SoC voltage, being allowed to do so.The_Assimilator - Tuesday, May 16, 2023 - link
This embarrassing disaster is the cherry on top for the dismal and disappointing Zen 4 launch. AMD managed to replicate the original Zen's rubbish memory controller, but this time around they "fixed" it by allowing board partners to overvolt it through the roof - with the inevitable result. Play stupid games, win stupid prizes.Sunrise089 - Tuesday, May 16, 2023 - link
I’ve vaguely followed this story but am obviously still somewhat out of the loop.I appreciate this article, but it seems like the conclusion here could be “ASRock boards don’t suffer from the overvolting issue,” no?
So what IS the real issue here? Is it just that Asus had bad voltage settings applied when users used faster memory? And that Asus just assumed they’d be fine because AMD would have protections in the chip that would prevent damage?
Is there more to it than that? Because otherwise I don’t understand why this is being presented as a general issue affecting AMD and multiple board partners if it’s really only Asus-specific.
meacupla - Wednesday, May 17, 2023 - link
There is more to it than that, yes. The older BIOS allowed 7000X3D to be overvolted when XMP was enabled. Asus was the most egregious, but this same flaw seems to have existed on all vendors.Asus mobos had a fail safe that didn't kick in properly.
It seems that AMD chips also don't have a fail safe that kicks in properly either.
AMD and mobo makers endorse fast RAM speeds, but AMD only "officially" supports DDR5-5200.
To ensure maximum RAM compatibility, Asus likely pushed Vsoc too high to get DDR5-6000 to 6400 to work on their mobos.
A high delta between Vsoc and Vram has resulted in poor RAM stability on the AM4 platform, and probably also does so on AM5, but that is just my guess.
Targon - Friday, May 26, 2023 - link
There is a difference between allowing the user to do stupid things, and the BIOS by default doing stupid things. This goes back to the old idea of AMD having supported freedom by allowing motherboard makers to tune things themselves, but when those motherboard makers completely screw up and don't even read the, "you shouldn't go over 1.3V" guidance, causing things to go horribly wrong, then AMD had to remove some of those freedoms.Remember as well that Intel had lots of time to really focus on allowing a lot of voltage to their chips since Intel went from 6th to 10th generation on the same CPU design, and only factory overclocking(more clock speed but also needing more voltage) made newer chips actually faster from those generations. AMD hasn't had to do that for quite a while, and the Ryzen improvements since the Zen+ days to now have all been design improvements, combined with benefits that come from using better fab processes(lower voltages, higher clock speeds, etc).
Realistically, there are some failsafes in place, but if the chip gets damaged due to excessive voltage, the failsafes in place seem to have broken down. It's like a fire killing your smoke detector, and as a result, you get no warning that your house is burning down.
edzieba - Wednesday, May 17, 2023 - link
"So what IS the real issue here?"- No overvoltage limits (or limits set far above hardware-bricking levels) in hardware or in AMD's AGESA, from launch
- No QC step by AMD and/or motherboard and/or DIMM vendors confirming voltage setpoints for EXPO do not exceed limits
Or worse
- No published voltage limits (or published limits incorrect) so everyone involved was flying blind in setting voltages in the first place
That every motherboard manufacturer simultaneously and independently decided to exceed core voltage limits seems extraordinarily unlikely. More likely is that they all believed based on information from AMD that they were operating within safe voltage ranges, and subsequently optimised voltages for speed and stability over power consumption (as they have been doing for years with XMP) unaware of Ryzen's vulnerability.
Targon - Friday, May 26, 2023 - link
AMD had given the guidance to the motherboard makers, but Asus clearly ignored that information. Further, when the X3D chips came out, AMD again would have had to tell the motherboard makers, "for this chip, these are the safe voltages!", and again, Asus dropped the ball, while clearly, ASRock and most others did not. If anything, that proves that ASRock is no longer that "low end garbage" brand that they were 20 years ago.haplo602 - Wednesday, May 17, 2023 - link
Thing is, nobody as of now explained why only 7800X3D burned out ... no other model did that ... Even GN did not try as their investigation was clearly in the clickbait and spectacle direction and not the scientific explanation direction ...meacupla - Wednesday, May 17, 2023 - link
Well the X3D vs regular is pretty obvious. Regular 7000 series are not as heat sensitive as X3D chips, since they don't have 3D V-cache sitting on top of the CPU.Between the X3D chips, it's not so obvious, since it could be any number of factors, including how the 7900X3D and 7950X3D are dual chiplets of dissimilar chips, how the BIOS was handling vsoc between the various CPUs, the most popular RAM configuration on those two (ie 16GB at 6000 vs 32~64GB at 3600~4800), etc.
Trying to destructively test a 7900X3D and 7950X3D is going to be very expensive, very quick.