NeMo-Guardrails Using Lynx 70B Cuda out of memory

Using Lynx 70B Cuda out of memory

Open sjay8 opened this issue 1 year ago • 1 comments

Hello! I'm running Nemo Guardrails on Google Colab using the T4 GPU. However, when I deploy Lynx 70b using this code: !python -m vllm.entrypoints.openai.api_server --port 5000 --model 'PatronusAI/Patronus-Lynx-70B-Instruct'

I have a Cuda out of memory issue:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 448.00 MiB. GPU

Does anyone know what I can do?

Aug 01 '24 18:08 sjay8

NeMo-Guardrails NeMo-Guardrails copied to clipboard

Using Lynx 70B Cuda out of memory

NeMo-Guardrails
NeMo-Guardrails copied to clipboard