text-generation-webui Add autosplit reserve parameter for exllamav2

Add autosplit reserve parameter for exllamav2

Open Panchovix opened this issue 11 months ago • 0 comments

Hi! Was wondering if autosplit reserve parameter, from exllamav2 could be added as a config when loading a model.

This lets you set an amount to reserve VRAM when using gpu-split, so it doesn't try to load most on GPU 0 (for example)

If having 2 GPUs, you can use [1024,1024] so each GPU has 1GB free after loading the model.

Line related: https://github.com/turboderp/exllamav2/blob/master/exllamav2/model.py#L373

Thanks!

Mar 25 '24 03:03 Panchovix