Voltage Lockdown: Investigating AMD's Recent AM5 AGESA Updates on ASRock's X670E Taichi
by Gavin Bonshor on May 16, 2023 12:00 PM EST- Posted in
- CPUs
- AMD
- ASRock
- Motherboards
- X3D
- Ryzen 7000
- X670E Taichi
- AGESA
It's safe to say that the last couple of weeks have been a bit chaotic for AMD and its motherboard partners. Unfortunately, it's been even more chaotic for some users with AMD's Ryzen 7000X3D processors. There have been several reports of Ryzen 7000 processors burning up in motherboards, and in some cases, burning out the chip socket itself and taking the motherboard with it.
Over the past few weeks, we've covered the issue as it's unfolded, with AMD releasing two official statements and motherboard vendors scrambling to ensure their users have been updating firmware in what feels like a grab-it-quick fire sale, pun very much intended. Not everything has been going according to plan, with AMD having released two new AGESA firmware updates through its motherboard partners to try and address the issues within a week.
The first firmware update made available to vendors, AGESA 1.0.0.6, addressed reports of SoC voltages being too high. This AGESA version put restrictions in place to limit that voltage to 1.30 V, and was quickly distributed to all of AMD's partners. More recently, motherboard vendors have pushed out even newer BIOSes which include AMD's AGESA 1.0.0.7 (BETA) update. With even more safety-related changes made under the hood, this is the firmware update AMD and their motherboard partners are pushing consumers to install to alleviate the issues – and prevent new ones from occurring.
In this article, we'll be taking a look at the effects of all three sets of firmware (AGESA 1.0.0.5c - 7) running on our ASRock X670E Taichi motherboard. The goal is to uncover what, if any, changes there are to variables using the AMD Ryzen 9 7950X3D, including SoC voltages and current drawn under intensive memory based workloads.
Here is our recent coverage of the Ryzen 7000X3D/7000 'burnout' issues, including two statements from AMD, an official response from ASUS and MSI:
- MSI Addresses CPU Voltages on AM5 Motherboards for Ryzen 7000X3D Processors
- AMD Issues Official Statement on Reported Ryzen 7000 Burnout Issues
- ASUS Issues Statement on Ryzen 7000X3D Processor Issues, Possible Voltage Issues with AMD EXPO
- AMD Issues Second Statement on Ryzen 7000 Burnout Issues: Caps SoC Voltages
AMD Ryzen 7000 AGESA Firmware: From 1.0.0.5c to 1.0.0.7 Within 32 Days
The first firmware update made available to vendors, AGESA 1.0.0.6, addressed reports of SoC voltages being too high, with new restrictions put in place to limit things to 1.30 V. In the case of the board we've been using to try and dig deeper into issues, the ASRock X670E Taichi, this was made available to the public on 4/27/23 through its 1.21 firmware update. More recently, on 5/4/23, ASRock made its latest 1.24.AS02 firmware, which includes AMD's AGESA 1.0.0.7 (BETA) update.
The AGESA 1.0.0.7 (BETA) update is the firmware that AMD has been planning to roll out most recently to alleviate the issues of burnout, not just for Ryzen 7000X3D chips with 3D V-Cache, but also across the broader Ryzen 7000 and AM5 ecosystem. In between the initial AGESA 1.0.0.5c firmware that brought AMD's Ryzen 7000X3D support to AM5 motherboards, in the space of a mere 32 days, AMD has released a total of three major AGESA versions, which ASRock has dutifully published for the X670E Taichi. We'll be using this as our baseline for our analysis and look into what's going on.
On top of this, AMD is also planning to release an even more robustly updated AGESA firmware, which could be in the coming weeks. Referred to internally as AGESA 1.0.0.9, we did reach out to AMD for comment on this, but our rep couldn't comment on "unannounced or internal only software stacks." It should also be noted that the current firmware at the time of writing available to users is a BETA version, implying that a newer AGESA is undoubtedly on its way. Still, the timescale of the release is anyone's guess currently.
So looking at the variations in AMD's AGESA updates over the last month, there hasn't been any official indication of changes other than the bare minimum, at least not from ASRock's descriptions. The following is what ASRock is saying about the descriptions of the AGESA updates:
- AGESA 1.0.0.5c: Initial support for Ryzen 7000X3D processors with 3D V-Cache.
- AGESA 1.0.0.6 (BETA): Improved memory compatibility, Optimizations for Ryzen 7000X3D, recommended update for Ryzen 7000X3D processors.
- AGESA 1.0.0.7 (BETA): Support for 48/24GB DDR5 memory modules.
The description of the changes, at least from the point of ascertaining what each AGESA is offering, is borderline pitiful. In none of the descriptions does it state what changes AMD has made to each AGESA firmware to address the current issues, which in all honesty, is a pretty big thing to omit. There are no indications whatsoever on ASRock's X670E Taichi BIOS page as to what each firmware changes, and with no public notes available to users, it's a case of "update to this firmware, it's recommended."
So what do we know about the changes? Well, we know the critical change going from AGESA 1.0.0.5c to the 0.6 and 0.7 versions is a lockdown on SoC voltage to 1.30 V. Previously, on the ASRock X670E Taichi with 1.0.0.5c; we were able to set the SoC voltage to 2.5 V, which would almost certainly result in frying our X3D chips like an egg.
Image Credit: Igor Wallossek, Igorslab.de
The other changes coming with AGESA 1.0.0.6, according to Igor Wallosek, the Editor-in-chief of Igorslab.de, AMD has also added two new PROCHOT entries that point directly to combating overheating. PROCHOT essentially means Processor Hot, and it is a controlled mechanism that is designed to protect the processor from overheating. There are two implementations here. The first is the PROCHOT Control mechanism which is precisely what it says on the tin. When the CPU hits a defined value, the component sends a PROCHOT Control signal, and the CPU draws less power to try and mitigate temperatures and reduce the risk of damage.
The second mechanism is PROCHOT Deassertion Ramp Time, which dictates how long a processor can ramp up the power after the initial PROCHOT Control signal has been disabled. Essentially, PROCHOT Deassertion Ramp is the time it takes for the processor to get back up to normal parameters, and different variables, including cooling, the aggressiveness of said cooling, and general heat dissipation quality, can dictate this time. If the processor is inadequately cooled, this can result in a longer deassertion ramp time, whereas more aggressive heat dissipation methods should theoretically allow for a quicker ramp-up time.
The Story So Far: Gamers Nexus Deep-Dive - The Ryzen 7000 CORE Fundamental Issues
Before the rollout of new firmware, Steve Burke, the Editor-in-Chief of Gamers Nexus, and his team investigated the issues in-depth, including looking at the original fried hardware from Speedrookie. This includes a faulty and bulged out Ryzen 7 7800X3D processor and his burnt ASUS ROG STRIX X670E E Gaming motherboard. Instead of RMA'ing the hardware, Steve Burke reached out to the user and offered to buy the hardware from him, minimizing the RMA lead time and allowing Speedrookie to purchase new hardware.
The 38:46 long video is a very good watch, and we certainly recommend that users watch this, especially for those more interested in the inner workings (or issues) of the Ryzen 7000X3D and 7000 series processors. To summarize Steve's findings, we took away the following points:
- AMD Ryzen 7000X3D CPUs are shutting down too late to mitigate physical damage.
- ASRock, GIGABYTE, and MSI have a 116°C thermal trip point, and ASUS has 106°C, but sometimes didn't work as intended.
- The thermal cut-off for Ryzen 7000X3D is supposed to be 106°C and 116°C for Ryzen 7000.
- AMD EXPO enabled on ASUS is 1.35V on SoC voltage up until BIOS 1202 (AGESA 1.0.0.6).
- ASUS's SoC Voltage settings were/are too high.
- The AGESA firmware rollout has been nothing short of chaos at this point.
- AMD is offering RMA (paying shipping both ways) on killed CPUs, even if EXPO has been used (at least in the US)
- No word on if motherboard vendors will honor the warranty (at the time of writing)
While Steve and his team at Gamers Nexus have gone deep into uncovering the root causes of the problem, one thing remains abundantly clear: the issue is not just one that relates to SoC voltage. There has certainly been some confusion between AMD themselves and its motherboard partners in implementing the appropriate failsafe to prevent the CPU (and motherboard socket, for that matter) from burning into oblivion.
The other problem relates to ASUS here, with a more aggressive implementation of its SoC voltages, which Gamers Nexus confirmed in their testing as running too high. Before the AGESA firmware (1.0.0.6) update through BIOS version 1202, ASUS was overshooting SoC voltage by 0.05 V over AMD's newly imposed SoC voltage limit of 1.3 V.
Image Credit: Gamers Nexus
Soldering leads and connecting the motherboard to a digital multimeter, a 1.35 V SoC setting within the ASUS firmware (and with EXPO enabled) resulted in an observed 1.398 V from an SoC pad. This was typically even higher when probed at the choke, at an eye-watering 1.42 V. This fundamentally poses a problem that ASUS's firmware and the SoC rails themselves aren't cohabiting well with each other. An additional 0.05 V on top of the recommended 1.30 V is a lot, to say the least, but adding an extra 0.05 V on top of that can undoubtedly lead to dielectric degradation and possibly lead to dead CPUs and burnt motherboard sockets.
Doing some preliminary testing on the effect of SoC voltage on stability on the latest AGESA 1.0.0.7 (BETA) firmware, our G.Skill DDR5-6000 kit of DDR5 memory (2 x 16 GB) on the ASRock X670E Taichi would automatically preset 1.30 V on the SoC when applying the EXPO memory profile. To elaborate, unfortunately, we tried 1.15 V, which was a no-go, and even 1.20 V was a no-go. We eventually settled on 1.25 V on the SoC for this kit and our Ryzen 9 7950X3D, and we found stability in memory-intensive benchmarks was solid.
Perhaps one of the biggest things to come outside of Gamers Nexus's testing was that AMD is now offering RMA support for users who have used EXPO memory profiles, something which normally voids the warranty on AMD's processors. Whether or not other regions intend to honor these RMA requests hasn't been confirmed, but it's unlikely to be an issue.
Still, it's a good gesture for users with damaged CPUs from an issue that is entirely not their fault. Motherboard vendors, on the other hand, operate within their policies and parameters, and it may be trickier getting an RMA on a damaged motherboard simply because AMD doesn't control motherboard vendors' RMA policies. We would hope in good faith that motherboard vendors will honor the warranty in instances of these burnout issues, but we cannot confirm if they will at this time.
Our Testing: Methodology, Test Setup, and Hardware
To summarize the reason for testing AMD's AGESA firmware, we aren't trying to replicate burning our Ryzen 7000X3D samples – enough processors have already been sacrificed for science. For that matter, we certainly didn't see or smell any smoke coming from our ASRock X670E Taichi during testing, so we'll take that as a good sign.
Our purpose for testing is to highlight any differences or variations in parameters and power-related elements coming from AMD's latest AGESA packages. This includes looking at rails like SoC voltage and Package Power Tracking (PPT) output from the AM5 CPU socket. As AMD has dialed down what users and motherboard vendors can apply in regards to SoC voltage to 1.30 V, it's worth noting that all of ASRock's firmware we've tested on the X670E Taichi in this piece automatically sets SoC voltage to 1.30 V. While we don't have the necessary tools and equipment to solder leads to the motherboard to observe 'physical' voltages, we are relying on HWInfo's reporting prowess, as well as looking at multiple temperatures.
We also did some in-house stability testing against the new SoC voltage limits, running a fresh batch of tests on our Ryzen 9 7950X3D paired with a G.Skill DDR5-6000 (2 x 16 GB) memory kit with its AMD EXPO memory profile enabled. We found that things weren't stable until we applied 1.25 V on the SoC voltage within the firmware. Hitting up to 1.25V on the SoC, our kit was rock solid, even in memory-intensive workloads and benchmarks.
That has been our focus, trying to push the memory as hard as we can to ensure complete stability. A lot of the fanfare surrounding the issue, on the whole, has been unfairly put on AMD's EXPO profiles as being one of the causes; it is not. We know that CPU-intensive workloads will generate more heat, but that isn't what we've been looking at investigating. We're looking for variations in current and power between the different firmware versions to see if AMD (and ASRock) has made optimizations within its framework to reduce these factors, with current, or more specifically over current and the integrated failsafes being bypassed, which is one of the key concerns in the burnouts.
Our test bench for our AGESA (AM5) update testing is as follows:
AMD Ryzen 7950X3D AGESA Test Platform | |
CPU | Ryzen 9 7950X3D ($699) 16 Cores, 32 Threads 120 W TDP |
Motherboard | ASRock X670E Taichi (BIOS 1.18, 1.21 & 1.24.AS02) |
Memory | G.Skill Trident Z5 Neo 2x16 GB DDR5-5200 (JEDEC Default) DDR5-6000 CL34 (EXPO Profile) |
Cooling | EK-AIO Elite 360 D-RGB 360 mm AIO |
Storage | SK Hynix 2TB Platinum P41 PCIe 4.0 x4 NMve |
Power Supply | Corsair HX1000 |
GPUs | AMD Radeon RX 6950 XT, Driver 31.0.12019 |
Operating Systems | Windows 11 22H2 |
For our choice of workloads, we're relying on the Memory Test Suite from Openbenchmarking.org via Phoronix to implement our memory-intensive workloads. Although some of these workloads aren't optimized and don't run on Windows, we used the CacheBench benchmark, which uses multiple data types across read, write, modify, and read/write/modify combined. As part of the LLCbench low-level architectural characterization benchmark suite, CacheBench is designed to test memory and cache bandwidth performance and relies on a compilation of C++ Toolchains and compilers.
Read on for more analysis.
39 Comments
View All Comments
techjunkie123 - Tuesday, May 16, 2023 - link
Any by anandtech reviewer, I meant reader.TheinsanegamerN - Wednesday, May 17, 2023 - link
As predicted, GN calls out AMD and people start whining.Silver5urfer - Wednesday, May 17, 2023 - link
GN kisses Nvidia and people shrug off. Also their new Muh Failure rate website page, it does not list the LGA1700 socket engineering failure, but has 12VHPWR as "Fixed per GN standards" is laughable at best as the socket design causes other unwanted behavior along with the HS contact and longevity of the PCB traces. See Buildzoid IMC video on RPL, in short contact issue for the Socket but hey you can use the Thermalgrizzly Contact Frame and fix it while screwing your mobo in the process with non-factory Torque spec funnily Thermalright one is far superior, however since GN said the former is good all people learn the hardway.Both the OPs are correct, throwing a bunch of zoom images and extrapolating on lack of information with confusing the end user tricking onto some space age analysis does not help. Meanwhile AT's solid pieces on both LGA1700 bendgate and the AGESA on X3D provides far more useful information the Thermal and Electrical behavior from 3 diff AGESA versions is excellent approach to check what is going on than poking in the dark (Lithography, Metallurgy etc), only positive thing to come out of GN was ASUS rolling back their shady tactics esp many YTers called them out.
Anandtech has no rival in how they cover, I wish they reviewed GPUs and staff did not leave (Ian, Andrei etc), but the facts are hard truth, like the YT content killed blogs like this which is a big loss to many but most of the people around the world do not care for in-depth pieces and real Tech Journalism which is not capitalizing on the content for clicks. All the reviewers out there just simply copy paste the slide deck OR read their PR guide in virtually all the sites / videos except AT.
cheshirster - Tuesday, May 16, 2023 - link
"we can see that everything is fundamentally well within control"I've seen an interesting behavior on GB board.
It was trying to prevent high voltage delta between vmem and vsoc when using manual settings.
The parasite current between different voltage lines (in case of too big delta) could be the case of the problem and you won't see the solution working by simply measuring voltages.
meacupla - Tuesday, May 16, 2023 - link
On Ryzen 2000, 3000, and 5000, high voltage delta between Vsoc and Vram did result in poor ram stability. Particularly when the DDR4 required more than 1.35V to run at its rated speed.Silver5urfer - Tuesday, May 16, 2023 - link
I already knew that Failure Analysis Lab won't do anything. It is known fact that how can an unnamed lab can breakdown the reason of the CPU failure after the fact when they are not the ones associated with the OEM manufacturer of the said processor. Same for that 12VHPWR GN's video which did not yield anything. It's just a shock value capitalization for the maximum hits on the topic the real deal was 12VHPWR Nvidia statement which was given to GN rather like AMD who is giving it from their PR handle directly, allowing Nvidia to shrug off.Anyways moving on the AMD perhaps did not do proper verification as this is their First take at EXPO and the Zen 4 processor outside the Server HPC space where they are severely limited to add the extra consumer features like Overclocking, and Intel has an edge here because Intel has been doing the OC business since their Core series processors debuted on DDR3 which means literally 2-3 generations of Memory Overclock experience. Plus they sponsor HWBot too.
I find Anandtech's conclusions far more useful, yet read non conclusive as stated which is obvious when you are dealing with Microprocessors made in this era where we have ton of variables at play. Plus gives a good insight on how the Current, Voltage and Temperatures are being effected thus giving some picture of the inner workings of the CPU which we cannot really ever know because one AMD does not provide documentation / datasheets like Intel, two is these are bleeding edge tech, it's hard to know many things esp when you have a non monolithic design, on Intel it's easy and esp Intel can fix the clock rate as in user can do it. Plus no uncore sitting on a different piece of die (may change in MTL and ARL in the future).
So yea this is a great piece on how the AGESA varies than a nice Electron Microscope zoom picture content with a ton of VLSI terminology thrown at the user to confuse them. However I do agree with GN's ASUS part 100%, because that company has been complete pile of rubbish nowadays. I had to return a lot of Z590 boards because their Mobo PCB paint was chipped off on brand new APEX boards. Then the whole BIOS problems associated with ASUS - ROG Forums are a disaster now, they killed the site with mobile focus. ASUS implements ARB, Anti Roll Back, forcing you to get restricted to a BIOS this is very bad because on Z590 their boards had RTX40 series PCIe4.0 issue as in they did not run on 4.0 speed. Beta BIOS had the fix but actual update did not and they had ARB on the actual one, that's how bad ASUS is, some of them cannot even be rolled back even if you use BIOS Flashback. The Armory Crate is a cancer software which you cannot get rid of due to Registry into deep OS stack. Same like Intel XTU (Should use Throttlestop which is leagues ahead).
All in all it's unfortunate situation AMD should improve this and gain from this experience, they learned a lot with Zen 3, the IODie was a mess on it now Zen 4 is solid in that dept esp when they reduced the Memory variables from 3 values (fclk, mclk, uclk) in zen 3 to zen 4 now only has 2 resulting in a stable I/O handling. Plus the significantly higher clock ratio etc.
Silver5urfer - Tuesday, May 16, 2023 - link
More clarification,Nvidia's 12VHPWR ultimately was flawed as Intel ATX 3.0 power revises the 4 Sense pins to be elongated plus use of the Tulip design vs the Dot which was mentioned by Igor and ignored by GN. And the fact that AMD's R9 295 X2 on Anandtech here pulls 500W using just 2x8Pin standard further reinforcing that 12VHPWR is a clearly rushed one, and the fact that RTX 3090Ti does not have this problem because of lack of 4 Sense pins thus limiting the hard power cap.
And about ASUS, to add after getting caught they are now issuing a PR that all Mobos with EXPO and Beta also are covered under warranty. GN take did something good they have a long way to go esp how their BIOS is top notch yet they cram too much voltage into every possible way. Taking advantage of the brand value and consumer mindset.
Hairs - Saturday, May 27, 2023 - link
Igor's analysis on the 12vHPWR was absolute guesswork based on looking at some mobile phone pictures posted to reddit.The failure analysis lab pointed out that tulip vs dot would not provide sufficient difference in power connector stability to cause the issue on its own, as suggested by Igor.
GN is the only tech outlet that did any actual testing on the cables, and they did this not just by sending one out to a professional lab for verification, but by doing individual unit tests on multiple physical tables to re-create possible error scenarios. Of all the possiblities (and they tested different cables from different vendors) the only one which reliably recreates the burnout is when the connector is both not fully seated, and also sits at a slight angle in the socket. Both of these are user-error problems, but the user error is compounded by the fact that the physical security of the socket (not the internal design of the pin connectors) isn't robust enough compared to the old 8-pin design.
Literally everyone else was guessing. Only GN actually tried to recreate the problem and validate what was going on.
Hairs - Saturday, May 27, 2023 - link
"The key takeaway is that, at least on the ASRock X670E Taichi, things are working as they should be with AGESA 1.0.0.7 (BETA), and we look forward to a full release (non-BETA) of their latest AGESA in the coming weeks."Anandtech haven't event tested OCP, which is one of ASUS's primary failures and yet claim "everything" is working. Great conclusion and in-depth analysis there. "Ignore that other reviewer who claims there are multiple problems, I ran HWinfo and it's grand."
Calling GN's work "shock value capitalization" when they were literally the only tech reviewers either on YT or on written blogs who took actual time to analyse things and delayed their content specifically to avoid bandwagon-jumping and pushing a scare narrative that all cards using the connector were guaranteed to burn up is laughable.
Where was Anandtech's deep analysis of the topic? Oh right they haven't done any GPU work in years other than reprint PR releases.
army165 - Tuesday, May 16, 2023 - link
I have a 7800X3D and an Asus B650 board. Should I take it out and inspect it for damage? I upgraded to the 1303 BIOS when I got the board and didn't move to the Beta BIOS 1410 until after Asus redacted their "we won't fix this if you use this" message on the BIOS description.