Latency Testing is Hard (RDNA 3 Power Saving)

In a previous article, we compared Infinity Cache latency between the RX 7900 XTX, and the smaller RX 7600. After further testing, some correction is in order. AMD’s RDNA 3 architecture uses aggressive power saving techniques. Part of this seems to involve lowering the Infinity Fabric clock when there aren’t a lot of outstanding requests. Unfortunately, this power saving complicates microbenchmarking, especially attempts to measure Infinity Cache latency. This will be a short post to address the issue.

RX 7900 XTX (Navi 31)

The top end RX 7900 XTX uses the Navi 31 chip, which has a chiplet setup. Navi 31’s shader array and all caches up to L2 sit on a 5 nm Graphics Compute Die (GCD). Infinity Cache and memory controllers sit on smaller Memory Cache Dies (MCDs).

From the scalar side, our initial results measured 161 ns of Infinity Cache latency. Subsequent testing with a triple monitor setup (which forces the card into a higher power state) gives a much lower 128.4 ns of Infinity Cache latency.

We see a similar pattern with vector accesses, where measured Infinity Cache latency dropped from 199 ns to 150.3 ns with the card in a higher power state. However, caches on the shader clock domain do not see a notable latency difference. That includes the L0, L1, L2, and scalar caches.

Navi 31 likely manages power states separately for the Infinity Fabric and shader array. The shader array clocks up if it has work queued up. The Infinity Fabric also checks if it has work queued up, and adjusts its clocks without regard to whether the shader array is busy.

RX 7600 (Navi 33)

The RX 7600 features a smaller RDNA 3 implementation, built on a small monolithic 6 nm die. It has a similar power saving strategy, but the effects are very different. Infinity Cache latency barely changes, with just a 14.4% increase in latency at its lower power state. VRAM latency however takes a massive jump.

Vector accesses show similar behavior, with just a 9.5% Infinity Cache latency difference depending on whether the Infinity Fabric was in power saving state. VRAM latency meanwhile more than doubles.

RDNA 3’s power saving strategy has different effects on Navi 33 and Navi 31. Navi 33’s monolithic nature may contribute.

Big and Small RDNA 3 Compared

Previously, we noted that the 7900 XTX’s Infinity Cache was a lot slower than the RX 7600’s. With the revised data, the gap is not as large. With scalar accesses, Navi 31’s Infinity Cache is roughly 9.7% slower than Navi 33’s.

Vector accesses paint a similar picture, with a 9.3% latency difference between the two RDNA 3 implementations. That places it in line with the difference between Navi 21 (RX 6900 XT) and Navi 23 (RX 6600 XT).

Surprisingly, VRAM latencies are also very similar across small and large RDNA 3 implementations.

RDNA 3 and RDNA 2 Compared

AMD introduced its large Infinity Cache in RDNA 2, so it’s interesting to see how their second generation Infinity Cache implementation compares. A slide from AMD suggests that Infinity Cache hit latency went down from RDNA 2 to RDNA 3. Even though a chiplet interface introduces a latency penalty, AMD was able to overcome that with higher clocks.

AMD’s slide, indicating that they overcame the latency penalty of a chiplet link via higher Infinity Fabric clocks

Our updated data lines up with AMD’s claim. RDNA 3’s Infinity Cache provides roughly 13.2% lower latency than RDNA 2’s for scalar accesses, when checking the same 32 MB test size.

With vector accesses, the 64 MB test size shows a 9.67% latency reduction in favor of RDNA 3. That almost exactly lines up with AMD’s common case claim.

In VRAM, the RX 7900 XTX consistently achieves better latency than the RX 6900 XT. At the 1 GB test size, the RX 7900 XTX achieves 221.24 ns and 234.55 ns access latency for scalar and vector accesses, respectively. For comparison, the RX 6900 XT gets 260 and 283.89 ns of latency for scalar and vector accesses. AMD should be proud of their achievement, as the RX 7900 XTX achieves better VRAM latency than any other big AMD GPU I’m aware of. It also puts their VRAM access latency very close to that of the RTX 4090 and GTX 1080.

Additional testing done, with more test points after 64 MB

An unresolved question is why the latency test cannot see RDNA 3’s full Infinity Cache capacity. On the RX 6900 XT, we see an inflection point at or very close to 128 MB, its advertised cache capacity. On the RX 7900 XTX, there’s an inflection point around 64 MB. Perhaps some cache capacity is reserved for fixed function units.

Final Words

Testing is hard and a lot of things can complicate testing, including boost behavior and power saving. Also, RDNA 3’s Infinity Cache outperforms its predecessor’s in every respect except for capacity.

If you like our articles and journalism, and you want to support us in our endeavors, then consider heading over to our Patreon or our PayPal if you want to toss a few bucks our way. If you would like to talk with the Chips and Cheese staff and the people behind the scenes, then consider joining our Discord.

Latency Testing is Hard (RDNA 3 Power Saving)

RX 7900 XTX (Navi 31)

RX 7600 (Navi 33)

Big and Small RDNA 3 Compared

RDNA 3 and RDNA 2 Compared

Final Words

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112