chaiNNer icon indicating copy to clipboard operation
chaiNNer copied to clipboard

Setting for the amount of (V)RAM used for upscaling

Open stonerl opened this issue 1 year ago • 49 comments

CPU upscaling & upscaling on Apple Silicon (CPU & GPU)

A value between 20% and 80% of the freely available memory can be chosen for upscaling. If desired, instead of the freely available memory, these values can be applied to the entire available RAM. If a user chooses to do so, a warning will be presented and when upscaling the settings and amount of used RAM will be logged.

For GPU upscaling, the amount of freely available can be set. This setting is only available on Windows and Linux.

MaxTileSize for NCNN on Apple Silicon has been added.

Should fix #876

stonerl avatar Aug 13 '23 16:08 stonerl

Just an idea.

CleanShot 2023-08-13 at 22 55 51

stonerl avatar Aug 13 '23 16:08 stonerl

I also implemented the logic for PyTorch.

To give you a general overview. This feature currently is Apple Silicon only. That is due to the unique memory architecture on newer Macs. Since there is no separation between RAM and VRAM, it is only necessary to set a general limit for usable RAM for upscaling.

The idea is that the minimum amount of RAM left for the system is 8 GB (which means 8 GB Macs are more or less out of the question). The max amount of RAM that can be reserved for the system is 80% of the total system memory.

I did some testing on my 32 GB Mac, and it looks promising.

RAM usage with 8 GB of reserved system memory:

CleanShot 2023-08-14 at 11 44 50@2x

RAM usage with 25.6 GB (80 %) of reserved system memory:

CleanShot 2023-08-14 at 11 43 33@2x

I'd also like to implement this for NCNN and would reuse some code snippets from #2070. Haven't looked into it yet, but is there something similar for ONNX?

In general, I think this approach could be reused for Windows/Linux. But I assume there needs to be a second input Field for VRAM. Furthermore, the description needs to be reworded there, since the RAM usage, there is coupled to the CPU and the reserved system memory basically is only used for CPU upscaling.

@RunDevelopment @joeyballentine, could I get some feedback from you?

stonerl avatar Aug 14 '23 10:08 stonerl

I'd also like to implement this for NCNN and would reuse some code snippets from https://github.com/chaiNNer-org/chaiNNer/pull/2070. Haven't looked into it yet, but is there something similar for ONNX?

We don't support estimated tile sizes for ONNX. We just never implemented it.

As for NCNN: it's complicated. We do estimate for NCNN, but that estimation frequently crashed the backend on macs for some reason, so I turned it off (#2006). It will now always estimate a tile size of 256 on mac. This is a good default, because it only needs 3~4GB of (V)RAM for most models.

RunDevelopment avatar Aug 14 '23 11:08 RunDevelopment

That is due to the unique memory architecture on newer Macs. Since there is no separation between RAM and VRAM, it is only necessary to set a general limit for usable RAM for upscaling.

That's not unique to macs, anyone upscaling using integrated graphics via NCNN will have the same problem. And of course, anyone upscaling on CPU will only use RAM.

joeyballentine avatar Aug 14 '23 12:08 joeyballentine

That's not unique to macs, anyone upscaling using integrated graphics via NCNN will have the same problem.

Are you sure? From my understanding, on x86, the integrated GPUs get a share of the main memory. This memory will then be subtracted from the main memory available for the CPU.

e.g., 32 GB Total Memory, 8 GB for the GPU, means the CPU will receive 24 GB of RAM.

On the Mac, that is different because the GPU can have 100% of RAM.

stonerl avatar Aug 14 '23 12:08 stonerl

@joeyballentine

Here is a comprehensive comparison:

https://www.sir-apfelot.de/en/compare-shared-memory-unified-memory-41023/

stonerl avatar Aug 14 '23 12:08 stonerl

I think Intel integrated graphics also uses the unified memory model. https://www.intel.com/content/www/us/en/support/articles/000020962/graphics.html

RunDevelopment avatar Aug 14 '23 13:08 RunDevelopment

Interesting. We would need someone with such a system to test this.

stonerl avatar Aug 14 '23 15:08 stonerl

For testing, I added a new switch that overwrites the memory settings and uses instead 80% of the available memory. Any thoughts.

CleanShot 2023-08-14 at 17 37 02@2x

stonerl avatar Aug 14 '23 15:08 stonerl

After some thorough thinking, I realized that the users would have far too much rope to hang themselves. Now a saner approach.

By the default, users can choose a percentage of freely available memory (20% – 80%). If they really want to live on the edge, they now have an extra option where they can explicitly use the percentage of total available system memory.

CleanShot 2023-08-14 at 21 45 50@2x

stonerl avatar Aug 14 '23 19:08 stonerl

Added calculation for NCNN MaxTileSize on Apple Silicon Macs. Left it 256 for Intel Macs, though.

It seems vkdev.get_heap_budget was reporting values that were far too high.

On my system with vkdev.get_heap_budget:

[2023-08-14 22:49:01.734] [info]  Backend: [92171] [INFO] Estimating memory required: 170.55 GB, 17.07 GB available. Estimated tile size: 1024

And with the psutil.virtual_memory().available and max memory set to 80%.

[2023-08-14 22:44:33.013] [info]  Backend: [88582] [INFO] Estimating memory required: 170.55 GB, 10.78 GB available. Estimated tile size: 512

stonerl avatar Aug 14 '23 21:08 stonerl

Can you clarify how this PR compares to #2070? At first glance it looks like it may have some similar logic (but possibly used in different places?).

JeremyRand avatar Aug 15 '23 13:08 JeremyRand

@JeremyRand

Of course. This one is foremost for Apple Silicon Macs, at least for the moment. But it could be extended to other platforms.

It is not only meant for NCNN upscaling but also for PyTorch. I'm also working on another PR that builds upon this one that implements tile size estimation for ONNX.

Since on an Apple Silicon Mac one has not to differentiate between video and system memory, it only needs one setting.

In general, this is a broader approach that does not focus on one specific backend.

stonerl avatar Aug 15 '23 13:08 stonerl

I see. Maybe there's some potential for reusing some of this for #2070's intended use case (CPU inference)? I think for my use cases in #2070, this UX would almost do what I want, except that I'd probably want the ability to use a percentage of total system RAM rather than free system RAM.

JeremyRand avatar Aug 15 '23 13:08 JeremyRand

(Using a percentage of RAM rather than an explicit number as #2070 does is probably better UX.)

JeremyRand avatar Aug 15 '23 13:08 JeremyRand

except that I'd probably want the ability to use a percentage of total system RAM rather than free system RAM.

That is what the Switch is used for. By default, it only uses free/available RAM. But toggling that switch will apply these percentage values to the total system RAM instead.

stonerl avatar Aug 15 '23 13:08 stonerl

It also shows the user how much system RAM would be used at max if this option was set.

CleanShot 2023-08-15 at 15 50 29

stonerl avatar Aug 15 '23 13:08 stonerl

Oh, I see. That wasn't obvious to me from the screenshot. Yeah, so this UX seems better than #2070 for my use cases. I'd probably be OK with putting #2070 on hold until this is merged, and then I could rebase #2070 so that it just applies your UI's settings to CPU inference too. Thoughts?

JeremyRand avatar Aug 15 '23 13:08 JeremyRand

Oh, I see. That wasn't obvious to me from the screenshot.

Maybe I need to reword this. Instead of Use total system memory something like Apply % to total system memory

stonerl avatar Aug 15 '23 14:08 stonerl

Oh, I see. That wasn't obvious to me from the screenshot.

Maybe I need to reword this. Instead of Use total system memory something like Apply % to total system memory

Maybe also helpful to show the amount of free RAM, e.g. "Up to 22.4 GB out of a total of 32 GB RAM (26 GB free) will be used."

Also minor nit; GB is not really the right unit for this, GiB is preferred for this purpose. (GB is a power of 1000, GiB is a power of 1024 per international standards.)

JeremyRand avatar Aug 15 '23 14:08 JeremyRand

Maybe also helpful to show the amount of free RAM, e.g. "Up to 22.4 GB out of a total of 32 GB RAM (26 GB free) will be used."

I was thinking about this, but free RAM differs so much, that showing the user a number would let them assume that it might always be the displayed value. That's why I didn't add it.

Also minor nit; GB is not really the right unit for this, GiB is preferred for this purpose. (GB is a power of 1000, GiB is a power of 1024 per international standards.)

Done

stonerl avatar Aug 15 '23 14:08 stonerl

I was thinking about this, but free RAM differs so much, that showing the user a number would let them assume that it might always be the displayed value. That's why I didn't add it.

Yeah that's fair I guess.

JeremyRand avatar Aug 15 '23 14:08 JeremyRand

How would you feel about making the percentage precise to 0.1% rather than 1%? On systems with a lot of RAM (my main system has around 300 GiB), being more precise than 1% would be nice. I realize the power-of-2 tile size estimation itself introduces more imprecision than that, but that's an orthogonal issue that will hopefully be resolved separately.

JeremyRand avatar Aug 15 '23 14:08 JeremyRand

How would you feel about making the percentage precise to 0.1% rather than 1%?

Done.

CleanShot 2023-08-15 at 17 45 12@2x

stonerl avatar Aug 15 '23 15:08 stonerl

This isn't Apple Silicon only anymore.

RAM settings now also apply for CPU upscaling with PyTorch on all platforms.

@JeremyRand you only need to add your CPU settings for NCNN. All the other bits are already in place.

stonerl avatar Aug 15 '23 16:08 stonerl

Very nice, I'll rebase #2070 after this is merged.

JeremyRand avatar Aug 15 '23 17:08 JeremyRand

Okay, so basically every bit of memory usage is now configurable. I added an option to set the amount of VRAM that should be used. Not used on Macs, though.

Here's a screenshot:

CleanShot 2023-08-15 at 19 47 09@2x

stonerl avatar Aug 15 '23 17:08 stonerl

Can't test this, though. Since I don't have a PC.

stonerl avatar Aug 15 '23 17:08 stonerl

Okay, so here are my thoughts on the settings:

  1. I like the use of percentages.
  2. I don't like the "Use total system memory" toggle. Just allow 100%. Why split one number into 2 settings?
  3. I think the "X GB out of a total of Y GB RAM" part of should be part of the RAM percentage setting.

RunDevelopment avatar Aug 16 '23 10:08 RunDevelopment

Thanks for your feedback. I think the main problem was the wording of the settings. I changed this. (screenshot ant the end)

Let me answer your question from back to front:

  1. I think the "X GB out of a total of Y GB RAM" part of should be part of the RAM percentage setting.

@JeremyRand also asked for this, but the thing is, free/available RAM is very volatile. So if we would display a value there, this would only be true for that single point in time when looking at the settings. The next time you open the settings, the values would be different. It could be 8 GiB and the next time 14 GiB.

  1. I don't like the "Use total system memory" toggle. Just allow 100%. Why split one number into 2 settings?

The idea here is that instead of allocating a max percentage of free memory psutil.virtual_memory().available we apply this percentage to the total memory available psutil.virtual_memory().total. These values usually vary a lot, e.g., on my system 80% of total system memory would be 25.6 GiB but 80% of free/available memory when I use my system as I usually do it would be around 12 GiB.

I hope the rewording makes this clear.

CleanShot 2023-08-16 at 13 10 23@2x

stonerl avatar Aug 16 '23 12:08 stonerl