rend3
rend3 copied to clipboard
Performance Issues with High Memory Usage
Thsi is a request for two enhancements related to texture memory usage:
1 The ability to fetch a texture back into the CPU given a TextureHandle 2. Some way to tell how much GPU memory is left.
When memory gets tight, the application can react by fetching a texture from the GPU, reducing it to a lower resolution, and replacing the big texture with a smaller one.
Currently I do this by re-reading the texture from disk, re-decompressing the texture, and reducing the resolution. This is slow, I/O bound, and steals disk time from loading of new textures. If I can request the texture back from the GPU, reduce it in size, and make a new, reduced-sized texture without doing I/O, that's a big win. Mip-mapping won't help; that doesn't reduce texture memory usage, just GPU load. So what's needed is the inverse operation of add_texture_2d:
Proposed addition to Rend3 API:
impl Renderer {
pub fn fetch_texture_data_2d(handle: TextureHandle) -> Result<Error, (UVec2, Vec<u8>)> {
...
}
}
This returns the size" and "data" fields from the Texture previously passed into add_texture_2d
.
This is just data access; there's no need to change anything GPU side. Changes will be made by creating a new Texture and modifying the Material using the existing API, usually resulting in dropping the old TextureHandle and releasing the space.
As for finding out how much memory is left, I gather this is difficult. But some way to tell if the program is about to exceed the GPU capacity is needed.
Thanks
Background, from issue #346:
Regarding texture memory usage, I now have a better idea of what's going on.
The only tool I have for seeing what the GPU is doing is NVidia's X Server utility, which tells me the total memory in use in the GPU. What I see is that, after loading and unloading many textures, the memory usage number goes up, and never comes back down much, although it does come back down by 5-10% or so.. The amount it shows is closely related to the peak texture usage, not the current texture usage. It turns out my peak is too big, because, as the program frantically loads and unloads textures as the camera moves, the ones taking up the most screen space have priority, which means downsizing distant textures is low priority and fast camera movement causes a large but temporary texture memory usage spike.
So this looks like something where GPU memory allocation allocates memory for texture purposes and does not give it back in a way that the NVidia X server utility can see. This is common behavior for allocators that get their memory from some lower level allocator, like the operating system's, and don't give it back, or only give it back if they can give back a big block.
Is that what's going on down at the allocation level? That would explain what I'm seeing. Thanks.
Re fetch texture: how would this be better than the current situation with add_texture_from_texture, which you can use to chop off the top level mipmap and make a new texture without that miplevel? In addition, this api is going to need to be async and will likely take up to 3 frames to get the contents of the data back to you.
Re memory usage: we should be reclaiming the data, but I can double check that texture deletions actually are causing deletions.
Filed https://github.com/gfx-rs/wgpu/issues/2447 about memory usage queries.
add_texture_from_texture looks useful. I'm not currently providing mipmapped versions of textures, because they would make them all at least 1.25x larger, 1.3333 with all the mip levels. But let me think about that. A strategy based on this might work.
I'm pretty sure texture deletions really are causing deletions. Or at least the memory is reused for new textures. Otherwise I'd see constantly increasing memory usage as I move the camera. What I see is increase to a peak, but little decrease from the peak, as measured by the NVidia X tool. I can move the camera for minutes with textures being loaded and unloaded without the peak increasing. This may be an artifact of how the memory usage is reported by the NVidia tool.
OK. I've thought about how to do this, and I don't need the ability to get a texture back. So cancel that request.
I still need some info about the GPU memory situation, though, to know when to start reducing texture resolution.
Uh oh. Tried enabling mip-mapping. No other change. Frame rate dropped from 43 FPS to 22 FPS. GPU not out of memory. Changes looked like this:
diff rendertextureconvert.rs ../obsolete/rendertextureconvert.rs
130c130
< let mips = 1; // ***TEMP** no mipmapping
---
> ////let mips = 1; // ***TEMP** no mipmapping
143,144c143,144
< mip_count: rend3::types::MipmapCount::Specific(NonZeroU32::new(mips).unwrap()),
< mip_source: rend3::types::MipmapSource::Uploaded,
---
> mip_count: rend3::types::MipmapCount::Maximum,
> mip_source: rend3::types::MipmapSource::Generated,
Undid change, rebuilt, frame rate back up to 43 FPS.
So this is generating mipmaps on the gpu every time you upload a texture, so if you are constantly uploading new things that could cause issues both cpu and gpu side.
There is an integration with tracy
that rend3 has through the profiling
library, which I would recommend using to get some more detailed information of your runtime.
See the tracy
feature of scene-viewer on how to do this. There's also https://docs.rs/wgpu-profiler/0.8.0/wgpu_profiler/chrometrace/fn.write_chrometrace.html which you can use to dump the gpu side profiling information.
I'm not sure what to do at this point.
For me, Rend3 0.3.x is much worse than 0.2.2. I only used the unreleased version because I needed the fix for #332. But the unreleased version has much bigger vertex objects. And it's slower.
Enabling mipmapping appears to double texture memory consumption. The amount of memory needed should be only 1.33333x, but if buffers are a power of two, that becomes 2x. So that's part of the bloat problem. And turning on mipmapping made it slower, not faster. I was at 59 FPS with 0.2.2. Now I'm at 22 FPS.
With vertex bloat and mipmapping bloat, an 8GB GPU isn't enough any more.
Is the lower performance inherent in adding Android support?
Should I ask for a backport of #332 to 0.2.x, go back, and stay there while the new problems are resolved?
Suggestions?
OK. Here's an initial Tracy profile. One frame, all content loaded, camera is stationary, nothing is happening except the refresh loop, no mipmapping, Rend3 Unreleased (pre 0.3.x). 48 FPS. Same as without profiling. Call stacks are not being captured, so this is rather coarse data.
My own code is barely doing anything here; it and the window event system are using 32us per frame.
With Rend3 0.2.2, that was about 58 FPS. So, lost about 10FPS with the new version.
This is the ideal case for frame rate; frame rate drops much lower when textures and meshes are being loaded from other threads. Now that I see how this works, I'll put in some calls to profiling::scope! so my own activity in other threads will show up. But that's not relevant to this capture.
This is basically just #533