LLM-VM
LLM-VM copied to clipboard
Add disk offloading for onsite LLM inference
Enable large models that can't fit on the GPU to run inference by passing params back and forth between RAM and GPU-RAM
This is what VLLM does: https://github.com/vllm-project/vllm