dxvk icon indicating copy to clipboard operation
dxvk copied to clipboard

wip: Memory defragmentation

Open doitsujin opened this issue 4 months ago • 7 comments

Builds on all the reworks from the past couple of weeks to implement memory defragmentation.

Things still to do

  • [ ] Get rid of the format conversion context in the D3D9 front-end. Some D3D9 games may break before this is solved.
  • [x] Put a limit on the number of resources to relocate at once, processing thousands in one go isn't free and might lead to frame time spikes.
  • [x] Only consider chunk allocations rather than the entire heap when deciding whether to defrag at all.
  • [ ] Test this a whole bunch
  • [x] Fix some useless error messages when we can't relocate a resource
  • [x] Work around some Nvidia driver bug which causes all apps to hang as soon as defragmentation happens (#4380)

What this does

As described in #4280, our memory allocator works on chunks of 256MB that are allocated from the system. The goal here is to make more efficient use of these chunks and actually return memory back to the system if we have a lot of unused memory sitting around. This is especially important under memory pressure (even more so on Nvidia due to the need of dedicated allocations for e.g. render targets), or if the app in question isn't actually a game but rather a launcher that temporarily eats over 1GB of VRAM but frees most of it when getting minimized.

As an example, Metaphor ReFantazio (the demo version in this case) allocates over 4GB of memory right at the start, just to free most of it right away. Because some small allocations are scattered all across the memory chunks, we still need to keep the full 4.6GB of memory around: Bildschirmfoto-696

With defragmentation, we get a lot closer to what the game is actively using: Bildschirmfoto-697

This only affects VRAM allocations that are not mapped into CPU address space. For CPU-accessible memory we require pointer stability, so moving those around dynamically is not feasible - that said, most mapped allocations are short-lived anyway so the problem usually solves itself.

The algorithm used here is very simple, we periodically look at the chunk that has the lowest amount of memory used and try to move those resources to existing chunks, while preventing the allocator from reusing that chunk until the memory is actually needed. This way, we essentially produce empty chunks which can subsequently be freed. While not optimal in any way, this generally seems to work well in practice.

What this does not (yet) do

We don't migrate any resources between VRAM and system memory yet. This is planned as a future PR and will likely be necessary in order to make Unity Engine games work better on cards with less than 12GB of VRAM (e.g. #4118).

This also means that if we do currently allocate a resource in system memory, we will not move it back to VRAM even if enough space would be available, so performance in those games is still going to be an issue.

doitsujin avatar Oct 18 '24 13:10 doitsujin