Nvidia recently introduced the new GeForce RTX 3090 family of graphics cards based on the Ampere architecture that replaced Turing. The previous architecture was revolutionary, for the first time offering hardware support for ray tracing and hardware acceleration of artificial intelligence tasks using tensor cores. But the performance of those GPUs was sometimes not enough even to use a couple of traced effects, so it is not surprising that Nvidia in Ampere focused on performance.
As soon as more subtle semiconductor technologies became available, together with the possibility of a significant increase in the number of transistors while maintaining an acceptable die area, the Ampere architecture immediately implemented improvements in terms of performance, and not for the emergence of new opportunities. Although they also exist, they are still clearly an evolutionary development of the capabilities of the previous Turing architecture. Offered at affordable prices, the new products have given users a welcome improvement in the price-performance ratio.
Solutions of the Ampere family, thanks to special solutions and production according to a finer process technology, provide increased energy efficiency and performance per unit die area, which is especially useful in the most demanding tasks, such as ray tracing in games, which significantly degrades performance. Ampere architecture gaming solutions are about 1.5-1.7 times faster than Turing in traditional rasterization tasks, and up to twice as fast in ray tracing.
The first graphics processor based on the Ampere architecture was the large “computing” chip GA100 , which was released in May and showed powerful performance gains in various computing tasks. But this is still a purely computing chip designed for highly specialized applications. And the Ampere-based GeForce RTX 30 series gaming graphics cards were unveiled by CEO Jensen Huang during Nvidia’s virtual event in early September.
In total, three models were presented: RTX 3090, RTX 3080 and RTX 3070, we have already examined the middle one, today we will find out everything about the top-end, but the time for the youngest will come in October. Models RTX 3090 and RTX 3080 are made on the basis of different modifications of the GA102 chip, which have a different number of active computing units. Even if the younger RTX 3070 should be about the level of the flagship of the previous RTX 2080 Ti line, the top-end RTX 3090 bypasses the expensive Titan RTX by 50%.
The most powerful model of the new line has 10496 CUDA cores, 24 GB of local video memory of the new standard GDDR6X and is excellent for games in the highest 8K-resolution. This is a Titan-class model with a price tag of $ 1,499 (136,990 rubles), but with the usual digital name – this time Nvidia decided (yet?) Not to release the Titan. The 3-slot model with a huge cooler is capable of handling any task, gaming and not only. The novelty is designed to be played at least in 4K resolution, and can even provide 60 FPS in 8K resolution in many games, especially with DLSS.
The basis of the video card model we are considering today is the new graphics processor of the Ampere architecture, but since it has quite a lot in common with the previous architectures Turing, Volta and sometimes even Pascal, we recommend that you familiarize yourself with our previous articles before reading the material:
This is the second model of the new generation and its name corresponds to the principle of naming the company’s solutions, since the less expensive RTX 3080 is below it. True, in the last generation there was no RTX 2090 model at all, but there was a separate Titan RTX. Accordingly, the recommended price for the GeForce RTX 3090 is closer not to the RTX 2080, but something in between the RTX 2080 Ti and the Titan RTX, since they are the top representatives of their generations – $ 1499. For our market, the recommendation for a price of 136,990 rubles at first may have seemed overstated, but due to the sharp drop in the national currency rate in recent years, no matter how much it still had to be adjusted upwards.
Either way, the RTX 3090 simply has no competition on the market, and Nvidia can price it as it sees fit. More precisely, it has a rival, and quite strong, but this is a model of the same line in the form of the RTX 3080, which, even in theoretical performance, is inferior to the top solution by 20% -25%. And it costs much cheaper! Therefore, if someone needs 10 GB of video memory and a slightly lower performance, then he has a considerable temptation to save money. On the other hand, if you need maximum performance and a large amount of memory, and the question of price is in third place, then there is simply no choice.
So far, there is nothing to say about competitors from AMD. The Radeon VII is outdated and out of production for a long time, the Radeon RX 5700 XT is a lower-end solution, and they have nothing else. So we are waiting for solutions based on the RDNA2 architecture, and the big “Big Navi” chip will be especially curious, although it is far from certain that it will be able to compete with the GeForce RTX 3090 either.
Nvidia has released a new series of video cards in its own design under the name Founders Edition. They offer curious cooling systems and austere designs not found in most graphics card manufacturers who are chasing the number and size of fans and colorful backlighting. The most interesting thing about the GeForce RTX 30 sold under the Nvidia brand is a completely new design of the cooling system with two fans arranged in an unusual way: the first more or less habitually blows air through the grill from the end of the board, but the second is installed on the back and pulls air straight through video card.
In this way, heat is removed from the components on the card to the hybrid evaporator chamber, where it is distributed along the entire length of the radiator. The left fan blows heated air out through the large vents in the mount, while the right fan directs air to the chassis blower fan, where it is typically found in most modern systems. These two fans run at different speeds, which can be adjusted individually for them.
This decision forced the engineers to change the entire structure. If conventional PCBs run the full length of video cards, then in the case of a blow-through fan, a short PCB had to be developed, with a reduced NVLink slot, new power connectors (an adapter for two ordinary 8-pin PCI-E is included). At the same time, it was very difficult to place a large number of phases for power supply and memory microcircuits on the card. But these changes made it possible to have a large fan cutout on the PCB so that the airflow didn’t get in the way.
Nvidia claims that the design of the Founders Edition coolers has resulted in noticeably quieter operation than standard coolers with two axial fans on one side, and they have better cooling efficiency. Therefore, the new cooling device solutions allowed for improved performance without increasing temperature and noise compared to the previous generation Turing graphics cards. So, according to the company, at a consumption level of 350 W, the novelty under consideration today is either 30 degrees colder than the Titan RTX model, or 20 dBA quieter. We will check this further.
The RTX 3090 graphics card is available in retail stores since September 24, but due to insufficient production and still high demand, a product at a good price will still have to be looked for. GeForce RTX 30 Founders Edition graphics cards should start selling on Nvidia’s Russian-language site on October 6. Naturally, the company’s partners release cards of their own design: Asus, Colorful, EVGA, Gainward, Galaxy, Gigabyte, Innovision 3D, MSI, Palit, PNY and Zotac.
Some of the graphics cards will be on sale by participating retailers from September 17 to October 20, complete with Watch Dogs: Legion and a one-year GeForce Now subscription. Also, the GeForce RTX 30 series graphics processors will be equipped with gaming systems from Acer, Alienware, Asus, Dell, HP, Lenovo and MSI and systems from leading Russian assemblers, including Boiling Machine, Delta Game, Hyper PC, InvasionLabs, OGO! And Edelweiss.
The GA102 uses Samsung’s 8nm process technology and is additionally optimized for Nvidia. The older Ampere gaming chip contains 28.3 billion transistors and has an area of 628.4 mm² – a good step up from 12 nm for Turing, but the same 7 nm process at TSMC still surpasses 8 nm for Samsung in terms of density. On chips of the same Ampere architecture, comparing the game GA102 and the large GA100 chip, which is produced in Taiwanese factories.
Most likely, Nvidia chose Samsung’s process technology based on the cost and availability of mass production of large chips. The output of suitable ones at the Samsung plant may well be better, the conditions for such a fat client are certainly special, and at TSMC, the production capacity of the 7 nm process technology is already occupied by other companies. So the Ampere games are manufactured in Samsung factories, most likely due to Nvidia’s disagreement with the prices offered by the Taiwanese or other conditions.
Like the company’s previous chips, the GA102 consists of large Graphics Processing Clusters (GPCs) that include multiple Texture Processing Clusters (TPCs) that contain Streaming Multiprocessor (SM) stream processors, Raster Operator (ROP) units, and controller’s memory. The complete GA102 chip contains seven GPC clusters, 42 TPC clusters, and 84 SM multiprocessors. Each GPC contains six TPCs, each of a pair of SMs, as well as one PolyMorph Engine for working with geometry.
GPC is a high-level cluster that includes all the key blocks for processing data inside it, each of them has a dedicated rasterization engine Raster Engine and now includes two ROP sections of eight blocks each – in the new Ampere architecture these blocks are not tied to memory controllers, but are located right into the GPC. In total, the full GA102 contains 10,752 CUDA streaming cores, 84 second-generation RT cores, and 336 third-generation tensor cores. The memory subsystem of the complete GA102 contains twelve 32-bit memory controllers for a total of 384-bits. Each 32-bit controller is associated with a 512KB L2 cache, giving a total L2 cache of 6MB for the full GA102.
But so far we have talked about the full chip, and even the top model of the GeForce RTX 3090 video card uses the GA102 version, slightly cut in the number of blocks. This modification received slightly reduced characteristics, in which there were seven active GPC clusters, and the number of SM blocks decreased by only two – that is, one of the TPC clusters with a pair of multiprocessors was simply turned off in one of the GPCs. Accordingly, in the end, the number of other blocks also differs: 10496 CUDA cores, 328 tensor cores, and 82 RT cores.
There are 328 texture units left, but all ROP units are active – 112. These figures are noticeably higher than those of the RTX 3080, but this is still not a complete chip. Another major difference from the GeForce RTX 3080 is the presence of 24 GB of fast GDDR6X memory, which is connected over a full 384-bit bus, which gives almost a terabyte of bandwidth. Unlike the 10 GB for the “mid-range” RTX 3080, this is definitely enough for everything. While Nvidia says no 4K game requires more memory, next-gen consoles with more memory and fast SSDs are coming soon, and some multiplatform or ported games may start to require more than 10GB of local video memory.
The bandwidth has also increased and reached 936 GB / s. But for such a powerful GPU, this may not always be enough, especially when the overall performance is doubled. In addition, while Micron lists the effective operating memory frequency as 21 GHz, Nvidia uses a rather conservative 19.5 for the RTX 3090 in their products – I wonder what this is? In the dampness of a new type of memory and / or too high power consumption?
We will not consider in detail the architectural improvements of Ampere in this article, everything is written in the theoretical material on the GeForce RTX 3080. The main innovation of Ampere is doubling the FP32 performance for each SM multiprocessor, compared to the Turing family, which led to a significant increase in peak performance. Almost the same applies to RT kernels – although their number has not changed, internal improvements have led to a doubling of the rate of searching for intersections of rays with geometry. Improved tensor kernels, although they did not double the performance under normal conditions, but the computation rate doubled, and it also became possible to double the processing speed of so-called sparse matrices.
All other architectural features of Ampere gaming solutions, including changes in SM multiprocessors, ROP units, caching and texturing system, tensor and RT cores, are discussed in detail in the theoretical review of the RTX 3080. It also contains information about the new type of GDDR6X memory, which is used in the older chips of the new line. All the improvements have led to the achievement of a fairly high energy efficiency, the entire architecture of Ampere was made with an emphasis on this, including a modified Samsung process technology, chip and printed circuit board design, software optimization and much more.
Let’s add just a small addition about an interesting set of RTX IO technologies that provide fast transfer and unpacking of resources to the GPU, which increases I / O system performance tenfold compared to conventional HDDs and traditional APIs. RTX IO in the future will provide very fast loading of game resources and will allow the creation of much more varied and detailed virtual worlds.
RTX IO unpacks data using GPU stream processors, this is done asynchronously – using high-performance computing kernels, using the direct memory access mechanisms of the Turing and Ampere architectures, an improved instruction set and a new architecture of SM multiprocessors, which allows using advanced asynchronous computing capabilities, also help in the process.
Nvidia had everything it needed to make this technology work before in their proprietary GPUDirect Storage technology, but with the exception of unpacking compressed data onto the GPU. This is precisely the fundamentally new feature of RTX IO and DirectStorage API. With the use of Nvidia GPUs, it was possible to implement a similar approach in Linux operating systems, but in Windows there are certain fundamental architectural limitations that do not allow realizing direct data exchange to the fullest.
Therefore, developers will have to wait while Microsoft implements these capabilities in their own DirectStorage API. However, this should not be much of a hindrance, as it is unlikely that in the coming years there will be games, even ported from next generation consoles, that will be able to take full advantage of the capabilities of fast SSDs. So far, developers are still focusing on mechanical HDD drives, but since the market share of SSD (NVMe, in particular) is growing rapidly, it will take a couple of years, and such games will definitely appear.