text-generation-webui
text-generation-webui copied to clipboard
Add autosplit reserve parameter for exllamav2
Hi! Was wondering if autosplit reserve parameter, from exllamav2 could be added as a config when loading a model.
This lets you set an amount to reserve VRAM when using gpu-split, so it doesn't try to load most on GPU 0 (for example)
If having 2 GPUs, you can use [1024,1024] so each GPU has 1GB free after loading the model.
Line related: https://github.com/turboderp/exllamav2/blob/master/exllamav2/model.py#L373
Thanks!