chaiNNer
chaiNNer copied to clipboard
Setting for the amount of (V)RAM used for upscaling
CPU upscaling & upscaling on Apple Silicon (CPU & GPU)
A value between 20% and 80% of the freely available memory can be chosen for upscaling. If desired, instead of the freely available memory, these values can be applied to the entire available RAM. If a user chooses to do so, a warning will be presented and when upscaling the settings and amount of used RAM will be logged.
For GPU upscaling, the amount of freely available can be set. This setting is only available on Windows and Linux.
MaxTileSize for NCNN on Apple Silicon has been added.
Should fix #876
Just an idea.
I also implemented the logic for PyTorch.
To give you a general overview. This feature currently is Apple Silicon only. That is due to the unique memory architecture on newer Macs. Since there is no separation between RAM and VRAM, it is only necessary to set a general limit for usable RAM for upscaling.
The idea is that the minimum amount of RAM left for the system is 8 GB (which means 8 GB Macs are more or less out of the question). The max amount of RAM that can be reserved for the system is 80% of the total system memory.
I did some testing on my 32 GB Mac, and it looks promising.
RAM usage with 8 GB of reserved system memory:
RAM usage with 25.6 GB (80 %) of reserved system memory:
I'd also like to implement this for NCNN and would reuse some code snippets from #2070. Haven't looked into it yet, but is there something similar for ONNX?
In general, I think this approach could be reused for Windows/Linux. But I assume there needs to be a second input Field for VRAM. Furthermore, the description needs to be reworded there, since the RAM usage, there is coupled to the CPU and the reserved system memory basically is only used for CPU upscaling.
@RunDevelopment @joeyballentine, could I get some feedback from you?
I'd also like to implement this for NCNN and would reuse some code snippets from https://github.com/chaiNNer-org/chaiNNer/pull/2070. Haven't looked into it yet, but is there something similar for ONNX?
We don't support estimated tile sizes for ONNX. We just never implemented it.
As for NCNN: it's complicated. We do estimate for NCNN, but that estimation frequently crashed the backend on macs for some reason, so I turned it off (#2006). It will now always estimate a tile size of 256 on mac. This is a good default, because it only needs 3~4GB of (V)RAM for most models.
That is due to the unique memory architecture on newer Macs. Since there is no separation between RAM and VRAM, it is only necessary to set a general limit for usable RAM for upscaling.
That's not unique to macs, anyone upscaling using integrated graphics via NCNN will have the same problem. And of course, anyone upscaling on CPU will only use RAM.
That's not unique to macs, anyone upscaling using integrated graphics via NCNN will have the same problem.
Are you sure? From my understanding, on x86, the integrated GPUs get a share of the main memory. This memory will then be subtracted from the main memory available for the CPU.
e.g., 32 GB Total Memory, 8 GB for the GPU, means the CPU will receive 24 GB of RAM.
On the Mac, that is different because the GPU can have 100% of RAM.
@joeyballentine
Here is a comprehensive comparison:
https://www.sir-apfelot.de/en/compare-shared-memory-unified-memory-41023/
I think Intel integrated graphics also uses the unified memory model. https://www.intel.com/content/www/us/en/support/articles/000020962/graphics.html
Interesting. We would need someone with such a system to test this.
For testing, I added a new switch that overwrites the memory settings and uses instead 80% of the available memory. Any thoughts.
After some thorough thinking, I realized that the users would have far too much rope to hang themselves. Now a saner approach.
By the default, users can choose a percentage of freely available memory (20% – 80%). If they really want to live on the edge, they now have an extra option where they can explicitly use the percentage of total available system memory.
Added calculation for NCNN MaxTileSize on Apple Silicon Macs. Left it 256 for Intel Macs, though.
It seems vkdev.get_heap_budget
was reporting values that were far too high.
On my system with vkdev.get_heap_budget
:
[2023-08-14 22:49:01.734] [info] Backend: [92171] [INFO] Estimating memory required: 170.55 GB, 17.07 GB available. Estimated tile size: 1024
And with the psutil.virtual_memory().available
and max memory set to 80%.
[2023-08-14 22:44:33.013] [info] Backend: [88582] [INFO] Estimating memory required: 170.55 GB, 10.78 GB available. Estimated tile size: 512
Can you clarify how this PR compares to #2070? At first glance it looks like it may have some similar logic (but possibly used in different places?).
@JeremyRand
Of course. This one is foremost for Apple Silicon Macs, at least for the moment. But it could be extended to other platforms.
It is not only meant for NCNN upscaling but also for PyTorch. I'm also working on another PR that builds upon this one that implements tile size estimation for ONNX.
Since on an Apple Silicon Mac one has not to differentiate between video and system memory, it only needs one setting.
In general, this is a broader approach that does not focus on one specific backend.
I see. Maybe there's some potential for reusing some of this for #2070's intended use case (CPU inference)? I think for my use cases in #2070, this UX would almost do what I want, except that I'd probably want the ability to use a percentage of total system RAM rather than free system RAM.
(Using a percentage of RAM rather than an explicit number as #2070 does is probably better UX.)
except that I'd probably want the ability to use a percentage of total system RAM rather than free system RAM.
That is what the Switch is used for. By default, it only uses free/available RAM. But toggling that switch will apply these percentage values to the total system RAM instead.
It also shows the user how much system RAM would be used at max if this option was set.
Oh, I see. That wasn't obvious to me from the screenshot. Yeah, so this UX seems better than #2070 for my use cases. I'd probably be OK with putting #2070 on hold until this is merged, and then I could rebase #2070 so that it just applies your UI's settings to CPU inference too. Thoughts?
Oh, I see. That wasn't obvious to me from the screenshot.
Maybe I need to reword this. Instead of Use total system memory something like Apply % to total system memory
Oh, I see. That wasn't obvious to me from the screenshot.
Maybe I need to reword this. Instead of Use total system memory something like Apply % to total system memory
Maybe also helpful to show the amount of free RAM, e.g. "Up to 22.4 GB out of a total of 32 GB RAM (26 GB free) will be used."
Also minor nit; GB is not really the right unit for this, GiB is preferred for this purpose. (GB is a power of 1000, GiB is a power of 1024 per international standards.)
Maybe also helpful to show the amount of free RAM, e.g. "Up to 22.4 GB out of a total of 32 GB RAM (26 GB free) will be used."
I was thinking about this, but free RAM differs so much, that showing the user a number would let them assume that it might always be the displayed value. That's why I didn't add it.
Also minor nit; GB is not really the right unit for this, GiB is preferred for this purpose. (GB is a power of 1000, GiB is a power of 1024 per international standards.)
Done
I was thinking about this, but free RAM differs so much, that showing the user a number would let them assume that it might always be the displayed value. That's why I didn't add it.
Yeah that's fair I guess.
How would you feel about making the percentage precise to 0.1% rather than 1%? On systems with a lot of RAM (my main system has around 300 GiB), being more precise than 1% would be nice. I realize the power-of-2 tile size estimation itself introduces more imprecision than that, but that's an orthogonal issue that will hopefully be resolved separately.
How would you feel about making the percentage precise to 0.1% rather than 1%?
Done.
This isn't Apple Silicon only anymore.
RAM settings now also apply for CPU upscaling with PyTorch on all platforms.
@JeremyRand you only need to add your CPU settings for NCNN. All the other bits are already in place.
Very nice, I'll rebase #2070 after this is merged.
Okay, so basically every bit of memory usage is now configurable. I added an option to set the amount of VRAM that should be used. Not used on Macs, though.
Here's a screenshot:
Can't test this, though. Since I don't have a PC.
Okay, so here are my thoughts on the settings:
- I like the use of percentages.
- I don't like the "Use total system memory" toggle. Just allow 100%. Why split one number into 2 settings?
- I think the "X GB out of a total of Y GB RAM" part of should be part of the RAM percentage setting.
Thanks for your feedback. I think the main problem was the wording of the settings. I changed this. (screenshot ant the end)
Let me answer your question from back to front:
- I think the "X GB out of a total of Y GB RAM" part of should be part of the RAM percentage setting.
@JeremyRand also asked for this, but the thing is, free/available RAM is very volatile. So if we would display a value there, this would only be true for that single point in time when looking at the settings. The next time you open the settings, the values would be different. It could be 8 GiB and the next time 14 GiB.
- I don't like the "Use total system memory" toggle. Just allow 100%. Why split one number into 2 settings?
The idea here is that instead of allocating a max percentage of free memory psutil.virtual_memory().available
we apply this percentage to the total memory available psutil.virtual_memory().total
. These values usually vary a lot, e.g., on my system 80% of total system memory would be 25.6 GiB but 80% of free/available memory when I use my system as I usually do it would be around 12 GiB.
I hope the rewording makes this clear.