NeMo-Guardrails
NeMo-Guardrails copied to clipboard
Using Lynx 70B Cuda out of memory
Hello! I'm running Nemo Guardrails on Google Colab using the T4 GPU. However, when I deploy Lynx 70b using this code:
!python -m vllm.entrypoints.openai.api_server --port 5000 --model 'PatronusAI/Patronus-Lynx-70B-Instruct'
I have a Cuda out of memory issue:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 448.00 MiB. GPU
Does anyone know what I can do?