I'm confused by the voltages on the first page. This article repeatedly mentions 0.5 V, when I think it intends to say 0.05 V. For example, 1.35 V is 0.05 V over the new 1.30 V.
Seems there should have been physical safeties built into the CPU so at worst the chip would shut down and set a fault flag indicating what the problem was. Annoying, but cheaper than replacing burnt up hardware, and ruined reputations.
"Gamers Nexus Deep-Dive - The Ryzen 7000 CORE Fundamental Issues" If you're not new to semiconductors, this video contains absolutely nothing other than some nice die shots. There are zero specific details on the matter of burned 7000 CPU's. It's basically: "Chips can die from numerous reasons and also we have nice shot of small AMD logo".
The most annoying thing about these GN videos is that they act as if they really know a lot, when in fact they probably know less than the average anandtech reviewer. Not just this video, but other ones too.
GN EXPLICITLY STATES that they are learning and that they're NOT experts in failure analysis.
This is why they, also EXPLICITLY state, they sent it out to an external lab, so that someone who IS a failure analysis expert, can actually do the detailed, technical, failure analysis.
You seem to have a chip on your shoulder when it comes to GN which has caused your panties to get all bunched up into a wad.
GN kisses Nvidia and people shrug off. Also their new Muh Failure rate website page, it does not list the LGA1700 socket engineering failure, but has 12VHPWR as "Fixed per GN standards" is laughable at best as the socket design causes other unwanted behavior along with the HS contact and longevity of the PCB traces. See Buildzoid IMC video on RPL, in short contact issue for the Socket but hey you can use the Thermalgrizzly Contact Frame and fix it while screwing your mobo in the process with non-factory Torque spec funnily Thermalright one is far superior, however since GN said the former is good all people learn the hardway.
Both the OPs are correct, throwing a bunch of zoom images and extrapolating on lack of information with confusing the end user tricking onto some space age analysis does not help. Meanwhile AT's solid pieces on both LGA1700 bendgate and the AGESA on X3D provides far more useful information the Thermal and Electrical behavior from 3 diff AGESA versions is excellent approach to check what is going on than poking in the dark (Lithography, Metallurgy etc), only positive thing to come out of GN was ASUS rolling back their shady tactics esp many YTers called them out.
Anandtech has no rival in how they cover, I wish they reviewed GPUs and staff did not leave (Ian, Andrei etc), but the facts are hard truth, like the YT content killed blogs like this which is a big loss to many but most of the people around the world do not care for in-depth pieces and real Tech Journalism which is not capitalizing on the content for clicks. All the reviewers out there just simply copy paste the slide deck OR read their PR guide in virtually all the sites / videos except AT.
"we can see that everything is fundamentally well within control" I've seen an interesting behavior on GB board. It was trying to prevent high voltage delta between vmem and vsoc when using manual settings. The parasite current between different voltage lines (in case of too big delta) could be the case of the problem and you won't see the solution working by simply measuring voltages.
On Ryzen 2000, 3000, and 5000, high voltage delta between Vsoc and Vram did result in poor ram stability. Particularly when the DDR4 required more than 1.35V to run at its rated speed.
I already knew that Failure Analysis Lab won't do anything. It is known fact that how can an unnamed lab can breakdown the reason of the CPU failure after the fact when they are not the ones associated with the OEM manufacturer of the said processor. Same for that 12VHPWR GN's video which did not yield anything. It's just a shock value capitalization for the maximum hits on the topic the real deal was 12VHPWR Nvidia statement which was given to GN rather like AMD who is giving it from their PR handle directly, allowing Nvidia to shrug off.
Anyways moving on the AMD perhaps did not do proper verification as this is their First take at EXPO and the Zen 4 processor outside the Server HPC space where they are severely limited to add the extra consumer features like Overclocking, and Intel has an edge here because Intel has been doing the OC business since their Core series processors debuted on DDR3 which means literally 2-3 generations of Memory Overclock experience. Plus they sponsor HWBot too.
I find Anandtech's conclusions far more useful, yet read non conclusive as stated which is obvious when you are dealing with Microprocessors made in this era where we have ton of variables at play. Plus gives a good insight on how the Current, Voltage and Temperatures are being effected thus giving some picture of the inner workings of the CPU which we cannot really ever know because one AMD does not provide documentation / datasheets like Intel, two is these are bleeding edge tech, it's hard to know many things esp when you have a non monolithic design, on Intel it's easy and esp Intel can fix the clock rate as in user can do it. Plus no uncore sitting on a different piece of die (may change in MTL and ARL in the future).
So yea this is a great piece on how the AGESA varies than a nice Electron Microscope zoom picture content with a ton of VLSI terminology thrown at the user to confuse them. However I do agree with GN's ASUS part 100%, because that company has been complete pile of rubbish nowadays. I had to return a lot of Z590 boards because their Mobo PCB paint was chipped off on brand new APEX boards. Then the whole BIOS problems associated with ASUS - ROG Forums are a disaster now, they killed the site with mobile focus. ASUS implements ARB, Anti Roll Back, forcing you to get restricted to a BIOS this is very bad because on Z590 their boards had RTX40 series PCIe4.0 issue as in they did not run on 4.0 speed. Beta BIOS had the fix but actual update did not and they had ARB on the actual one, that's how bad ASUS is, some of them cannot even be rolled back even if you use BIOS Flashback. The Armory Crate is a cancer software which you cannot get rid of due to Registry into deep OS stack. Same like Intel XTU (Should use Throttlestop which is leagues ahead).
All in all it's unfortunate situation AMD should improve this and gain from this experience, they learned a lot with Zen 3, the IODie was a mess on it now Zen 4 is solid in that dept esp when they reduced the Memory variables from 3 values (fclk, mclk, uclk) in zen 3 to zen 4 now only has 2 resulting in a stable I/O handling. Plus the significantly higher clock ratio etc.
Nvidia's 12VHPWR ultimately was flawed as Intel ATX 3.0 power revises the 4 Sense pins to be elongated plus use of the Tulip design vs the Dot which was mentioned by Igor and ignored by GN. And the fact that AMD's R9 295 X2 on Anandtech here pulls 500W using just 2x8Pin standard further reinforcing that 12VHPWR is a clearly rushed one, and the fact that RTX 3090Ti does not have this problem because of lack of 4 Sense pins thus limiting the hard power cap.
And about ASUS, to add after getting caught they are now issuing a PR that all Mobos with EXPO and Beta also are covered under warranty. GN take did something good they have a long way to go esp how their BIOS is top notch yet they cram too much voltage into every possible way. Taking advantage of the brand value and consumer mindset.
Igor's analysis on the 12vHPWR was absolute guesswork based on looking at some mobile phone pictures posted to reddit.
The failure analysis lab pointed out that tulip vs dot would not provide sufficient difference in power connector stability to cause the issue on its own, as suggested by Igor.
GN is the only tech outlet that did any actual testing on the cables, and they did this not just by sending one out to a professional lab for verification, but by doing individual unit tests on multiple physical tables to re-create possible error scenarios. Of all the possiblities (and they tested different cables from different vendors) the only one which reliably recreates the burnout is when the connector is both not fully seated, and also sits at a slight angle in the socket. Both of these are user-error problems, but the user error is compounded by the fact that the physical security of the socket (not the internal design of the pin connectors) isn't robust enough compared to the old 8-pin design.
Literally everyone else was guessing. Only GN actually tried to recreate the problem and validate what was going on.
"The key takeaway is that, at least on the ASRock X670E Taichi, things are working as they should be with AGESA 1.0.0.7 (BETA), and we look forward to a full release (non-BETA) of their latest AGESA in the coming weeks."
Anandtech haven't event tested OCP, which is one of ASUS's primary failures and yet claim "everything" is working. Great conclusion and in-depth analysis there. "Ignore that other reviewer who claims there are multiple problems, I ran HWinfo and it's grand."
Calling GN's work "shock value capitalization" when they were literally the only tech reviewers either on YT or on written blogs who took actual time to analyse things and delayed their content specifically to avoid bandwagon-jumping and pushing a scare narrative that all cards using the connector were guaranteed to burn up is laughable.
Where was Anandtech's deep analysis of the topic? Oh right they haven't done any GPU work in years other than reprint PR releases.
I have a 7800X3D and an Asus B650 board. Should I take it out and inspect it for damage? I upgraded to the 1303 BIOS when I got the board and didn't move to the Beta BIOS 1410 until after Asus redacted their "we won't fix this if you use this" message on the BIOS description.
I'm on a B650E-F with a 7700X CPU. With all that's gone on here with the X3D parts, I think I'll give it awhile to sort things out before I think about upgrading to the 7800X3D. I upgraded to 1406 when I first got my motherboard, and I believe was the first one with X3D support (until ASUS started daily edits of their BIOS and CPU support lists anyway). I have a very stable system, so I plan to sit on the sidelines and not upgrade my BIOS until there's a non-beta one that's been listed for at least a couple of months.
I think if it was actually damaged it would have likely shown up in improper working. Most of the damage seems to have been people manually increasing SoC voltage, being allowed to do so.
This embarrassing disaster is the cherry on top for the dismal and disappointing Zen 4 launch. AMD managed to replicate the original Zen's rubbish memory controller, but this time around they "fixed" it by allowing board partners to overvolt it through the roof - with the inevitable result. Play stupid games, win stupid prizes.
I’ve vaguely followed this story but am obviously still somewhat out of the loop.
I appreciate this article, but it seems like the conclusion here could be “ASRock boards don’t suffer from the overvolting issue,” no?
So what IS the real issue here? Is it just that Asus had bad voltage settings applied when users used faster memory? And that Asus just assumed they’d be fine because AMD would have protections in the chip that would prevent damage?
Is there more to it than that? Because otherwise I don’t understand why this is being presented as a general issue affecting AMD and multiple board partners if it’s really only Asus-specific.
There is more to it than that, yes. The older BIOS allowed 7000X3D to be overvolted when XMP was enabled. Asus was the most egregious, but this same flaw seems to have existed on all vendors. Asus mobos had a fail safe that didn't kick in properly. It seems that AMD chips also don't have a fail safe that kicks in properly either.
AMD and mobo makers endorse fast RAM speeds, but AMD only "officially" supports DDR5-5200. To ensure maximum RAM compatibility, Asus likely pushed Vsoc too high to get DDR5-6000 to 6400 to work on their mobos. A high delta between Vsoc and Vram has resulted in poor RAM stability on the AM4 platform, and probably also does so on AM5, but that is just my guess.
There is a difference between allowing the user to do stupid things, and the BIOS by default doing stupid things. This goes back to the old idea of AMD having supported freedom by allowing motherboard makers to tune things themselves, but when those motherboard makers completely screw up and don't even read the, "you shouldn't go over 1.3V" guidance, causing things to go horribly wrong, then AMD had to remove some of those freedoms.
Remember as well that Intel had lots of time to really focus on allowing a lot of voltage to their chips since Intel went from 6th to 10th generation on the same CPU design, and only factory overclocking(more clock speed but also needing more voltage) made newer chips actually faster from those generations. AMD hasn't had to do that for quite a while, and the Ryzen improvements since the Zen+ days to now have all been design improvements, combined with benefits that come from using better fab processes(lower voltages, higher clock speeds, etc).
Realistically, there are some failsafes in place, but if the chip gets damaged due to excessive voltage, the failsafes in place seem to have broken down. It's like a fire killing your smoke detector, and as a result, you get no warning that your house is burning down.
- No overvoltage limits (or limits set far above hardware-bricking levels) in hardware or in AMD's AGESA, from launch - No QC step by AMD and/or motherboard and/or DIMM vendors confirming voltage setpoints for EXPO do not exceed limits Or worse - No published voltage limits (or published limits incorrect) so everyone involved was flying blind in setting voltages in the first place
That every motherboard manufacturer simultaneously and independently decided to exceed core voltage limits seems extraordinarily unlikely. More likely is that they all believed based on information from AMD that they were operating within safe voltage ranges, and subsequently optimised voltages for speed and stability over power consumption (as they have been doing for years with XMP) unaware of Ryzen's vulnerability.
AMD had given the guidance to the motherboard makers, but Asus clearly ignored that information. Further, when the X3D chips came out, AMD again would have had to tell the motherboard makers, "for this chip, these are the safe voltages!", and again, Asus dropped the ball, while clearly, ASRock and most others did not. If anything, that proves that ASRock is no longer that "low end garbage" brand that they were 20 years ago.
Thing is, nobody as of now explained why only 7800X3D burned out ... no other model did that ... Even GN did not try as their investigation was clearly in the clickbait and spectacle direction and not the scientific explanation direction ...
Well the X3D vs regular is pretty obvious. Regular 7000 series are not as heat sensitive as X3D chips, since they don't have 3D V-cache sitting on top of the CPU.
Between the X3D chips, it's not so obvious, since it could be any number of factors, including how the 7900X3D and 7950X3D are dual chiplets of dissimilar chips, how the BIOS was handling vsoc between the various CPUs, the most popular RAM configuration on those two (ie 16GB at 6000 vs 32~64GB at 3600~4800), etc.
Trying to destructively test a 7900X3D and 7950X3D is going to be very expensive, very quick.
Gamernexus premise on poor SOC overvolt is wrong. You have to question if their other findings are reliable. Unofficial Asus video on die sense and socket sense, not a new thing, they had one few years back in the same channel. They have even put this features in their ROG X670E homepage. Auto-translate to english video here https://www.youtube.com/watch?v=l8r4LVV_jsQ
Major takeaway point are - Die Sense is enabled on default for the C8H and other premium ASUS boards, and GN has same board, and that means HWInfo should actually show proper data. This is like those premium Intel boards which have a switch that allows directly reading from Die Sense on the fly (HWinfo helps to spit out that data). Only diff here it's already default you get 2 measurement points to compare it to !
So GN fked up by using the farthest point, and they did not even check or bothered to check the HWinfo reading when they are doing their big space age big brain investigation and throw a bunch of terms at the audience to confuse massively. 1M views already are not free so the sensationalism has to get the maximum coverage skipping all the points in the middle and it works, always did because avg consumer is a dumb rock.
I presume AT's X670 Taichi is also similar design, so the VSoC reading is accurate (guessing). Now if you go to Igor's Lab they have a Gigabyte Aorus X670E which is also using Mobo read points, they make the similar mistake like GN using board readouts farthest ones to measure the Voltages which gives them again wrong picture how GB is also shoving 0.03-0.05 volts more despite the new AGESA, and they are ignoring the HWinfo readings as I see only one measurement result from them too. Why not just double check the HWInfo readings instead go for the singular measurement point ?
so you boot up the system into BIOS/UEFI, change the settings you want to test and then it reboots and fries the CPU right away ... HOW do you get anything from HWinfo there when you did not even make it to Windows with a functional CPU ? but I am sure you would figure out a way genius ...
You completely seem to miss the point. Let alone understand this. I'm talking about the behavior while you are talking about a scenario of all CPUs are dying and so boo hoo I cannot get a read out. If I had an Intel board I'd knew it because I as I alr mentioned I know Die Sense on Apex exists directly.
The key takeaway for me is that the majority of mobo makers caused the burnouts by automatically bumping the SoC voltages too high when EXPO is enabled. It does not surprise me at all that Asus had excessive voltage. IME they always push the envelope to get minutely better performance numbers and great reviews. It does not surprise me that Asrock used a proper mobo design. They have been doing this for many years IME.
And that's likely all you need. It's both the minimum and maximum for me - I wasn't able to go beyond 1.25V (which incidentally measured as 1.272V...) without running into issues, while going below it showed computation errors in y-Cruncher's HNT test - a great tool for diagnosing Infinity Fabric instability, which also applies to BOINC tasks.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
39 Comments
Back to Article
dullard - Tuesday, May 16, 2023 - link
I'm confused by the voltages on the first page. This article repeatedly mentions 0.5 V, when I think it intends to say 0.05 V. For example, 1.35 V is 0.05 V over the new 1.30 V.Ryan Smith - Tuesday, May 16, 2023 - link
You are correct. Thanks!Threska - Tuesday, May 16, 2023 - link
Seems there should have been physical safeties built into the CPU so at worst the chip would shut down and set a fault flag indicating what the problem was. Annoying, but cheaper than replacing burnt up hardware, and ruined reputations.Samus - Wednesday, May 17, 2023 - link
This is AMD we're talking about here, not Intel. AMD chips going back to the Athlon XP have always lacked fail safes when compared to the competition.Netmsm - Thursday, May 25, 2023 - link
what a judgment!cheshirster - Tuesday, May 16, 2023 - link
"Gamers Nexus Deep-Dive - The Ryzen 7000 CORE Fundamental Issues"If you're not new to semiconductors, this video contains absolutely nothing other than some nice die shots.
There are zero specific details on the matter of burned 7000 CPU's.
It's basically: "Chips can die from numerous reasons and also we have nice shot of small AMD logo".
techjunkie123 - Tuesday, May 16, 2023 - link
The most annoying thing about these GN videos is that they act as if they really know a lot, when in fact they probably know less than the average anandtech reviewer. Not just this video, but other ones too.TheinsanegamerN - Wednesday, May 17, 2023 - link
Which one are you referring to, the one that left 2 years ago or the one that left 5 years ago?alpha754293 - Wednesday, May 17, 2023 - link
This is so stupid.What's your beef with GN?
GN EXPLICITLY STATES that they are learning and that they're NOT experts in failure analysis.
This is why they, also EXPLICITLY state, they sent it out to an external lab, so that someone who IS a failure analysis expert, can actually do the detailed, technical, failure analysis.
You seem to have a chip on your shoulder when it comes to GN which has caused your panties to get all bunched up into a wad.
Skeptical123 - Thursday, May 18, 2023 - link
lol it looks like you got that backwardstechjunkie123 - Tuesday, May 16, 2023 - link
Any by anandtech reviewer, I meant reader.TheinsanegamerN - Wednesday, May 17, 2023 - link
As predicted, GN calls out AMD and people start whining.Silver5urfer - Wednesday, May 17, 2023 - link
GN kisses Nvidia and people shrug off. Also their new Muh Failure rate website page, it does not list the LGA1700 socket engineering failure, but has 12VHPWR as "Fixed per GN standards" is laughable at best as the socket design causes other unwanted behavior along with the HS contact and longevity of the PCB traces. See Buildzoid IMC video on RPL, in short contact issue for the Socket but hey you can use the Thermalgrizzly Contact Frame and fix it while screwing your mobo in the process with non-factory Torque spec funnily Thermalright one is far superior, however since GN said the former is good all people learn the hardway.Both the OPs are correct, throwing a bunch of zoom images and extrapolating on lack of information with confusing the end user tricking onto some space age analysis does not help. Meanwhile AT's solid pieces on both LGA1700 bendgate and the AGESA on X3D provides far more useful information the Thermal and Electrical behavior from 3 diff AGESA versions is excellent approach to check what is going on than poking in the dark (Lithography, Metallurgy etc), only positive thing to come out of GN was ASUS rolling back their shady tactics esp many YTers called them out.
Anandtech has no rival in how they cover, I wish they reviewed GPUs and staff did not leave (Ian, Andrei etc), but the facts are hard truth, like the YT content killed blogs like this which is a big loss to many but most of the people around the world do not care for in-depth pieces and real Tech Journalism which is not capitalizing on the content for clicks. All the reviewers out there just simply copy paste the slide deck OR read their PR guide in virtually all the sites / videos except AT.
cheshirster - Tuesday, May 16, 2023 - link
"we can see that everything is fundamentally well within control"I've seen an interesting behavior on GB board.
It was trying to prevent high voltage delta between vmem and vsoc when using manual settings.
The parasite current between different voltage lines (in case of too big delta) could be the case of the problem and you won't see the solution working by simply measuring voltages.
meacupla - Tuesday, May 16, 2023 - link
On Ryzen 2000, 3000, and 5000, high voltage delta between Vsoc and Vram did result in poor ram stability. Particularly when the DDR4 required more than 1.35V to run at its rated speed.Silver5urfer - Tuesday, May 16, 2023 - link
I already knew that Failure Analysis Lab won't do anything. It is known fact that how can an unnamed lab can breakdown the reason of the CPU failure after the fact when they are not the ones associated with the OEM manufacturer of the said processor. Same for that 12VHPWR GN's video which did not yield anything. It's just a shock value capitalization for the maximum hits on the topic the real deal was 12VHPWR Nvidia statement which was given to GN rather like AMD who is giving it from their PR handle directly, allowing Nvidia to shrug off.Anyways moving on the AMD perhaps did not do proper verification as this is their First take at EXPO and the Zen 4 processor outside the Server HPC space where they are severely limited to add the extra consumer features like Overclocking, and Intel has an edge here because Intel has been doing the OC business since their Core series processors debuted on DDR3 which means literally 2-3 generations of Memory Overclock experience. Plus they sponsor HWBot too.
I find Anandtech's conclusions far more useful, yet read non conclusive as stated which is obvious when you are dealing with Microprocessors made in this era where we have ton of variables at play. Plus gives a good insight on how the Current, Voltage and Temperatures are being effected thus giving some picture of the inner workings of the CPU which we cannot really ever know because one AMD does not provide documentation / datasheets like Intel, two is these are bleeding edge tech, it's hard to know many things esp when you have a non monolithic design, on Intel it's easy and esp Intel can fix the clock rate as in user can do it. Plus no uncore sitting on a different piece of die (may change in MTL and ARL in the future).
So yea this is a great piece on how the AGESA varies than a nice Electron Microscope zoom picture content with a ton of VLSI terminology thrown at the user to confuse them. However I do agree with GN's ASUS part 100%, because that company has been complete pile of rubbish nowadays. I had to return a lot of Z590 boards because their Mobo PCB paint was chipped off on brand new APEX boards. Then the whole BIOS problems associated with ASUS - ROG Forums are a disaster now, they killed the site with mobile focus. ASUS implements ARB, Anti Roll Back, forcing you to get restricted to a BIOS this is very bad because on Z590 their boards had RTX40 series PCIe4.0 issue as in they did not run on 4.0 speed. Beta BIOS had the fix but actual update did not and they had ARB on the actual one, that's how bad ASUS is, some of them cannot even be rolled back even if you use BIOS Flashback. The Armory Crate is a cancer software which you cannot get rid of due to Registry into deep OS stack. Same like Intel XTU (Should use Throttlestop which is leagues ahead).
All in all it's unfortunate situation AMD should improve this and gain from this experience, they learned a lot with Zen 3, the IODie was a mess on it now Zen 4 is solid in that dept esp when they reduced the Memory variables from 3 values (fclk, mclk, uclk) in zen 3 to zen 4 now only has 2 resulting in a stable I/O handling. Plus the significantly higher clock ratio etc.
Silver5urfer - Tuesday, May 16, 2023 - link
More clarification,Nvidia's 12VHPWR ultimately was flawed as Intel ATX 3.0 power revises the 4 Sense pins to be elongated plus use of the Tulip design vs the Dot which was mentioned by Igor and ignored by GN. And the fact that AMD's R9 295 X2 on Anandtech here pulls 500W using just 2x8Pin standard further reinforcing that 12VHPWR is a clearly rushed one, and the fact that RTX 3090Ti does not have this problem because of lack of 4 Sense pins thus limiting the hard power cap.
And about ASUS, to add after getting caught they are now issuing a PR that all Mobos with EXPO and Beta also are covered under warranty. GN take did something good they have a long way to go esp how their BIOS is top notch yet they cram too much voltage into every possible way. Taking advantage of the brand value and consumer mindset.
Hairs - Saturday, May 27, 2023 - link
Igor's analysis on the 12vHPWR was absolute guesswork based on looking at some mobile phone pictures posted to reddit.The failure analysis lab pointed out that tulip vs dot would not provide sufficient difference in power connector stability to cause the issue on its own, as suggested by Igor.
GN is the only tech outlet that did any actual testing on the cables, and they did this not just by sending one out to a professional lab for verification, but by doing individual unit tests on multiple physical tables to re-create possible error scenarios. Of all the possiblities (and they tested different cables from different vendors) the only one which reliably recreates the burnout is when the connector is both not fully seated, and also sits at a slight angle in the socket. Both of these are user-error problems, but the user error is compounded by the fact that the physical security of the socket (not the internal design of the pin connectors) isn't robust enough compared to the old 8-pin design.
Literally everyone else was guessing. Only GN actually tried to recreate the problem and validate what was going on.
Hairs - Saturday, May 27, 2023 - link
"The key takeaway is that, at least on the ASRock X670E Taichi, things are working as they should be with AGESA 1.0.0.7 (BETA), and we look forward to a full release (non-BETA) of their latest AGESA in the coming weeks."Anandtech haven't event tested OCP, which is one of ASUS's primary failures and yet claim "everything" is working. Great conclusion and in-depth analysis there. "Ignore that other reviewer who claims there are multiple problems, I ran HWinfo and it's grand."
Calling GN's work "shock value capitalization" when they were literally the only tech reviewers either on YT or on written blogs who took actual time to analyse things and delayed their content specifically to avoid bandwagon-jumping and pushing a scare narrative that all cards using the connector were guaranteed to burn up is laughable.
Where was Anandtech's deep analysis of the topic? Oh right they haven't done any GPU work in years other than reprint PR releases.
army165 - Tuesday, May 16, 2023 - link
I have a 7800X3D and an Asus B650 board. Should I take it out and inspect it for damage? I upgraded to the 1303 BIOS when I got the board and didn't move to the Beta BIOS 1410 until after Asus redacted their "we won't fix this if you use this" message on the BIOS description.Golgatha777 - Tuesday, May 16, 2023 - link
I'm on a B650E-F with a 7700X CPU. With all that's gone on here with the X3D parts, I think I'll give it awhile to sort things out before I think about upgrading to the 7800X3D. I upgraded to 1406 when I first got my motherboard, and I believe was the first one with X3D support (until ASUS started daily edits of their BIOS and CPU support lists anyway). I have a very stable system, so I plan to sit on the sidelines and not upgrade my BIOS until there's a non-beta one that's been listed for at least a couple of months.GreenReaper - Saturday, May 27, 2023 - link
I think if it was actually damaged it would have likely shown up in improper working. Most of the damage seems to have been people manually increasing SoC voltage, being allowed to do so.The_Assimilator - Tuesday, May 16, 2023 - link
This embarrassing disaster is the cherry on top for the dismal and disappointing Zen 4 launch. AMD managed to replicate the original Zen's rubbish memory controller, but this time around they "fixed" it by allowing board partners to overvolt it through the roof - with the inevitable result. Play stupid games, win stupid prizes.Sunrise089 - Tuesday, May 16, 2023 - link
I’ve vaguely followed this story but am obviously still somewhat out of the loop.I appreciate this article, but it seems like the conclusion here could be “ASRock boards don’t suffer from the overvolting issue,” no?
So what IS the real issue here? Is it just that Asus had bad voltage settings applied when users used faster memory? And that Asus just assumed they’d be fine because AMD would have protections in the chip that would prevent damage?
Is there more to it than that? Because otherwise I don’t understand why this is being presented as a general issue affecting AMD and multiple board partners if it’s really only Asus-specific.
meacupla - Wednesday, May 17, 2023 - link
There is more to it than that, yes. The older BIOS allowed 7000X3D to be overvolted when XMP was enabled. Asus was the most egregious, but this same flaw seems to have existed on all vendors.Asus mobos had a fail safe that didn't kick in properly.
It seems that AMD chips also don't have a fail safe that kicks in properly either.
AMD and mobo makers endorse fast RAM speeds, but AMD only "officially" supports DDR5-5200.
To ensure maximum RAM compatibility, Asus likely pushed Vsoc too high to get DDR5-6000 to 6400 to work on their mobos.
A high delta between Vsoc and Vram has resulted in poor RAM stability on the AM4 platform, and probably also does so on AM5, but that is just my guess.
Targon - Friday, May 26, 2023 - link
There is a difference between allowing the user to do stupid things, and the BIOS by default doing stupid things. This goes back to the old idea of AMD having supported freedom by allowing motherboard makers to tune things themselves, but when those motherboard makers completely screw up and don't even read the, "you shouldn't go over 1.3V" guidance, causing things to go horribly wrong, then AMD had to remove some of those freedoms.Remember as well that Intel had lots of time to really focus on allowing a lot of voltage to their chips since Intel went from 6th to 10th generation on the same CPU design, and only factory overclocking(more clock speed but also needing more voltage) made newer chips actually faster from those generations. AMD hasn't had to do that for quite a while, and the Ryzen improvements since the Zen+ days to now have all been design improvements, combined with benefits that come from using better fab processes(lower voltages, higher clock speeds, etc).
Realistically, there are some failsafes in place, but if the chip gets damaged due to excessive voltage, the failsafes in place seem to have broken down. It's like a fire killing your smoke detector, and as a result, you get no warning that your house is burning down.
edzieba - Wednesday, May 17, 2023 - link
"So what IS the real issue here?"- No overvoltage limits (or limits set far above hardware-bricking levels) in hardware or in AMD's AGESA, from launch
- No QC step by AMD and/or motherboard and/or DIMM vendors confirming voltage setpoints for EXPO do not exceed limits
Or worse
- No published voltage limits (or published limits incorrect) so everyone involved was flying blind in setting voltages in the first place
That every motherboard manufacturer simultaneously and independently decided to exceed core voltage limits seems extraordinarily unlikely. More likely is that they all believed based on information from AMD that they were operating within safe voltage ranges, and subsequently optimised voltages for speed and stability over power consumption (as they have been doing for years with XMP) unaware of Ryzen's vulnerability.
Targon - Friday, May 26, 2023 - link
AMD had given the guidance to the motherboard makers, but Asus clearly ignored that information. Further, when the X3D chips came out, AMD again would have had to tell the motherboard makers, "for this chip, these are the safe voltages!", and again, Asus dropped the ball, while clearly, ASRock and most others did not. If anything, that proves that ASRock is no longer that "low end garbage" brand that they were 20 years ago.haplo602 - Wednesday, May 17, 2023 - link
Thing is, nobody as of now explained why only 7800X3D burned out ... no other model did that ... Even GN did not try as their investigation was clearly in the clickbait and spectacle direction and not the scientific explanation direction ...meacupla - Wednesday, May 17, 2023 - link
Well the X3D vs regular is pretty obvious. Regular 7000 series are not as heat sensitive as X3D chips, since they don't have 3D V-cache sitting on top of the CPU.Between the X3D chips, it's not so obvious, since it could be any number of factors, including how the 7900X3D and 7950X3D are dual chiplets of dissimilar chips, how the BIOS was handling vsoc between the various CPUs, the most popular RAM configuration on those two (ie 16GB at 6000 vs 32~64GB at 3600~4800), etc.
Trying to destructively test a 7900X3D and 7950X3D is going to be very expensive, very quick.
haplo602 - Thursday, May 18, 2023 - link
but they brag about the sunk cost and beg for shop purchases throughout the whole video ...they could have at least tried a regular 7800 in the same mobo and compare the voltage readings to have at least something relevant ...
dan121loveu - Wednesday, May 17, 2023 - link
Gamernexus premise on poor SOC overvolt is wrong. You have to question if their other findings are reliable. Unofficial Asus video on die sense and socket sense, not a new thing, they had one few years back in the same channel. They have even put this features in their ROG X670E homepage. Auto-translate to english video here https://www.youtube.com/watch?v=l8r4LVV_jsQSilver5urfer - Wednesday, May 17, 2023 - link
Very interesting and a good video.Major takeaway point are - Die Sense is enabled on default for the C8H and other premium ASUS boards, and GN has same board, and that means HWInfo should actually show proper data. This is like those premium Intel boards which have a switch that allows directly reading from Die Sense on the fly (HWinfo helps to spit out that data). Only diff here it's already default you get 2 measurement points to compare it to !
So GN fked up by using the farthest point, and they did not even check or bothered to check the HWinfo reading when they are doing their big space age big brain investigation and throw a bunch of terms at the audience to confuse massively. 1M views already are not free so the sensationalism has to get the maximum coverage skipping all the points in the middle and it works, always did because avg consumer is a dumb rock.
I presume AT's X670 Taichi is also similar design, so the VSoC reading is accurate (guessing). Now if you go to Igor's Lab they have a Gigabyte Aorus X670E which is also using Mobo read points, they make the similar mistake like GN using board readouts farthest ones to measure the Voltages which gives them again wrong picture how GB is also shoving 0.03-0.05 volts more despite the new AGESA, and they are ignoring the HWinfo readings as I see only one measurement result from them too. Why not just double check the HWInfo readings instead go for the singular measurement point ?
Top notch journalism nowadays lmao..
haplo602 - Thursday, May 18, 2023 - link
so you boot up the system into BIOS/UEFI, change the settings you want to test and then it reboots and fries the CPU right away ... HOW do you get anything from HWinfo there when you did not even make it to Windows with a functional CPU ? but I am sure you would figure out a way genius ...Silver5urfer - Thursday, May 18, 2023 - link
You completely seem to miss the point. Let alone understand this. I'm talking about the behavior while you are talking about a scenario of all CPUs are dying and so boo hoo I cannot get a read out. If I had an Intel board I'd knew it because I as I alr mentioned I know Die Sense on Apex exists directly.Techie2 - Wednesday, May 17, 2023 - link
The key takeaway for me is that the majority of mobo makers caused the burnouts by automatically bumping the SoC voltages too high when EXPO is enabled. It does not surprise me at all that Asus had excessive voltage. IME they always push the envelope to get minutely better performance numbers and great reviews. It does not surprise me that Asrock used a proper mobo design. They have been doing this for many years IME.dicobalt - Thursday, May 18, 2023 - link
This reminds me when Intel released Core was first released and the memory controller would get easily fried. I was one of the fryers.biostud - Friday, May 19, 2023 - link
I'm using the 1.21 BIOS with 1.0.0.6 AGESA in my ASRock X670E PRO RS, it only applies 1.25V voltage for vSoc on my 7800X3D.GreenReaper - Saturday, May 27, 2023 - link
And that's likely all you need. It's both the minimum and maximum for me - I wasn't able to go beyond 1.25V (which incidentally measured as 1.272V...) without running into issues, while going below it showed computation errors in y-Cruncher's HNT test - a great tool for diagnosing Infinity Fabric instability, which also applies to BOINC tasks.