LLM-VM Add disk offloading for onsite LLM inference

Add disk offloading for onsite LLM inference

Open VictorOdede opened this issue 1 year ago • 1 comments

Enable large models that can't fit on the GPU to run inference by passing params back and forth between RAM and GPU-RAM

Sep 04 '23 15:09 VictorOdede

This is what VLLM does: https://github.com/vllm-project/vllm

Sep 04 '23 17:09 mmirman