LLM-VM icon indicating copy to clipboard operation
LLM-VM copied to clipboard

Add disk offloading for onsite LLM inference

Open VictorOdede opened this issue 1 year ago • 1 comments

Enable large models that can't fit on the GPU to run inference by passing params back and forth between RAM and GPU-RAM

VictorOdede avatar Sep 04 '23 15:09 VictorOdede

This is what VLLM does: https://github.com/vllm-project/vllm

mmirman avatar Sep 04 '23 17:09 mmirman