Portal: Prelude RTX is an impressive showcase of Nvidia’s RTX Remix technology, which takes what was once a Source mod for Portal and gives it visual features and technology that rival and even surpass high-end triple AAA releases. It is truly spectacular – and hopefully one of many path-traced remasters to come in the future as the RTX Remix modding tools approach release.
More interestingly though, Prelude is also the first game to support RTX IO, a GPU-accelerated decompression scheme running under Vulkan. This is essentially an Nvidia-branded version of Direct Storage 1.2, which is also included in Ratchet and Clank: Rift Apart which launches on PC later this month. Its purpose is to accelerate game loading and asset streaming on the PC platform, and its inclusion here gives us a good excuse to see how the technology works.
Historically, loading involved game data like textures or models being transferred from a hard drive to system memory and then on to the GPU under the control of the CPU. This was quite a latency-heavy, serial approach as the disk had to physically spin up the spindle, locate the data, then load the data block by block in a way that minimises the amount of seeking necessary.
This technique worked well enough with relatively small game assets being loaded from HDDs, but with games now being hundreds of gigabytes in size with extremely detailed assets, all of this data needs to be compressed to make good use of the available storage space and bandwidth. That means assets need to be decompressed by the CPU before they can be used on the GPU, and the extra time and CPU burden this imposes means that the traditional approach starts to break down.
Happily, the advent of rapid, low-latency flash storage in SSDs mean that we don’t need to read data in a sequential way to minimise seek times – we can create a new standard. First, we want to acces data in parallel to massively reduce load times compared to the old Windows I/O standard. Secondly, we want to ensure that data is moved from storage to the GPU before it’s decompressed. GPUs have lots of cores and do massively parallel tasks like decompression better than CPUs, so this approach saves a lot of time. This is the new system envisioned for RTX IO and Direct Storage 1.2, and it delivers faster loading times and, when used in gameplay for streaming purposes, a reduction in CPU load which can potentially improve performance.
For RTX IO, as it is here in Portal Prelude RTX, the data on disk is compressed using the GDeflate format and moved over to system memory temporarily, then over to VRAM and decompressed there by the GPU. This GDeflate format is an open GPU compression standard from Nvidia, which was given over to Microsoft and the Kronos Group, and is the format I expect to see used in Direct Storage 1.2 games using DirectX on PC – with GPUs from Nvidia, AMD and Intel all supported.
In contrast, Portal: Prelude RTX uses the Vulkan graphics API, which has no agreed-upon vendor-agnostic standard calls for GPU decompression; as far as I know there are currently just the proposed extensions from Nvidia. These extensions from Nvidia for GPU decompression could potentially be the ones that are adapted wholesale by the Kronos group for Vulkan’s Direct Storage Equivalent. In the meantime, the fast GPU decompression in Portal Prelude: RTX will only work on drivers that support these specific extensions, ie on Nvidia RTX graphics cards.
However, Portal Prelude RTX still operates on a more traditional loading paradigm, which means RTX IO doesn’t boost frame-rates. After all, RTX Remix is not replacing the game engine or altering how levels are split up and loaded; RTX Remix is instead just changing how rendering is done and how assets are loaded to feed that rendering. This is different than Ratchet and Clank: Rift Apart, which should also be using GPU decompression to speed up gameplay. Portal Prelude RTX therefore mainly benefits in terms of dedicated load times and visible texture load times.
To test how much of an effect the tech has here, I tested a build of the game running with RTX IO off and running off a SATA SSD capped at 500MB/s. The game loads reasonably quickly, but the textures take some time to reach their highest quality – without RTXIO’s GDeflate compression, the game on disk is fully uncompressed and ~60 percent larger. So bandwidth is taxed correspondingly to move the textures into VRAM, taking a little more than a second for the last texture to load. With RTX IO on, that same texture on a SATA SSD loads in less than half the time.
|Configuration||Load to Game||Texture Load|
|12900K + 500MB/s SATA SSD + RTX IO off||1.13s||2.36s|
|12900K + 500MB/s SATA SSD + RTX IO on||0.67s||1.16s|
|12900K + 3.5GB/s NVMe SSD + RTX IO off||0.57s||1.45s|
|12900K + 3.5GB/s NVMe SSD + RTX IO on||0.53s||1.07s|
This isn’t exactly the largest real-world difference, as half a second is gone in a flash, but the halving the time is still impressive. After having done a number of tests in different configurations, I have two interesting takeaways. First, a 500MB/s SATA drive with RTX IO enabled beats a 3.5GB/s NVMe drive with RTX IO off – pretty outstanding. Secondly, CPU and GPU hardware differences did not dramatically impact loading times, with the RTX 2060 Super + Core i9 12900K system performing much the same as the same CPU with the flagship RTX 4090; an RTX 4070 and Ryzen 5 3600 system was also very close in terms of load times.
So Portal: Prelude RTX is a promising first outing of this technology on PC, but at the same time it’s mundane as it is applying to a game that uses an old loading paradigm in the first place. With games that use active streaming and no loading screens of any sort, like Ratchet and Clank: Rift Apart and other future games, this is where this technology will show its mettle best. Of course, we’re looking forward to covering that title very soon, with the game arriving on PC on July 26th – so stay tuned.