AMD Radeon RX 7900 XTX Detailed: A Chiplet Architecture Made of 12 Mini-GPUs

A new patent from AMD details what could be the basis of the next generation of graphics cards. Based on a complex chiplet architecture, they likely represent the founding principles behind the (now canceled) Radeon RX 8900 XTX. The patent lays the blueprint of a highly flexible chiplet GPU architecture consisting of up to 12 dies working in parallel without a central or master die.

The AMD patent focuses on work distribution between chiplets consisting of specialized blocks, including the command processor, geometry engine, shader engine, and rasterizers. Each die executes its respective indices independently of the rest, followed by the next immediate task in the process. Therefore they can thought of as several mini-GPUs working in tandem like an MIMD model.

When a draw call is issued, the first geometry engine calculates the portions of the index buffer to be fetched. This can be based on the number of geometry engines working on the draw call, the ID of the first engine, the number of indices to fetch for each portion, etc. This allows the calculation of the portions of the index buffer locally and independently on the various geometry engines and in parallel.

The CPU communicates with the main memory and the GPU chiplets via the PCIe bus. The chiplets themselves are connected using a cross-link (Infinity Fabric). The applications running on the CPUs see the chiplets as a single entity, with the firmware being the only component capable of distinguishing between them.

The patent explains ways to distribute work between geometry engines. In one of them, the number of indices per portion is determined using the size of the primitive group. The size depends on the primitive type. Each geometry engine calculates its indices to process independently of the rest, allowing parallel execution without synchronization or barriers.

The below figure illustrates the drawing of three different strips by separate geometry engines. The shaded primitives are dropped due to the index reset, and the dotted arc indicates the direction of the winding order. Index resets reverse the direction of the winding order from counter-clockwise to clockwise or vice versa.

Another method of RX 8900 XTX utilizes a state management scheme to synchronize the disaggregated chiplets. For every draw call, each command processor generates a state ID corresponding to the numerous pipelines (processing stages) carried out by the chiplets.

Much of the chiplet GPU patent breaks down ways to distribute the indices between the geometry engines (chiplets). In one instance, the command processor is paired with two pipelines. Consequently, its (two) geometry, shader engines, and rasterizers are linked to one pipeline each. Similarly, the remaining chiplets work on two pipelines in parallel.

In conclusion, the above patent discusses partitioning draw calls, primitives, and/or indices between the numerous GPU chiplets. The arrangement has to be such that the latency penalty and queue divergence are minimal. The shader engines on the different dies need to be adequately utilized and synchronized without the need to share data across the interconnect.
Temporal accumulation was the death of SLI and XFX, and there’s little chance for a chiplet GPU unless these issues are addressed. From the look of things, I’d say that we’re still at least two generations away from a chiplet GPU. The Radeon RX 8900 XTX has allegedly been canceled, leaving RDNA 4 with budget and midrange monolithic designs. That leaves the RDNA 5-powered RX 9900 XTX as the first potential chiplet graphics card (from AMD)

Source: Freepatents (Via Elchapuzas)

AreejDecember 8, 2023

AMD Radeon RX 7900 XTX Detailed: A Chiplet Architecture Made of 12 Mini-GPUs

Related

Leave a Reply Cancel reply