stable-diffusion-webui
stable-diffusion-webui copied to clipboard
[Feature Request]: Load big model to main RAM and reduce for VRAM
Is there an existing issue for this?
- [X] I have searched the existing issues and checked the recent builds/commits
What would your feature do ?
Adding a command line argument have a big model load on main RAM and based on the prompt and max memory usage set seng a biased model to VRAM.
Proposed workflow
- Invoke WEB UI with argument --bias-to-prompt --max-mem=6GB
- Model is loaded on main RAM and biased for the prompt
- Gets sent to GPU
- Produce image from biased model in VRAM
Additional information
This is intended to be used with 8GB VRAM max capacity GPU's. I guess transformers could be used for this but I am guessing. So this way we could use bigger models on 8 GB GPU's
If any use, ChatGPT says this can be done on CPU:
Pruning: Prune the model to remove unnecessary connections or weights, reducing its size and memory footprint. Chunking: Divide the pruned model into smaller modules or components to facilitate dynamic loading and execution of only the necessary parts. Parameter Sharing: Apply weight sharing techniques to exploit redundancy within the model architecture, further reducing the number of unique parameters. Dynamic Graph Execution: Use dynamic computation graphs to construct and execute only the relevant parts of the model graph during inference, minimizing memory allocation for unused portions.
I mean, ChatGPT doesn't really have all the code context or anything... Loading the model onto RAM is possible, but having your GPU exchange information with your CPU RAM is really expensive in time and performance, especially for a ~10GB checkpoint/model... I'm no contributor to the project so I'm just giving my two cents here, but I don't think this is much of a good idea to do, and/or it's not really that easy.