exo icon indicating copy to clipboard operation
exo copied to clipboard

CUDA out of memroy - Benjamin.

Open lipere123 opened this issue 1 year ago • 2 comments

./exo-cli-3.1-70b.sh hello Go for : #!/bin/bash /usr/bin/curl --progress-bar --connect-timeout 1800 --max-time 1800 http://edgenode2:8000/v1/chat/completions
-H "Content-Type: application/json"
-d '{ "model": "llama-3.1-70b", "messages": [{"role": "user", "content": "hello"}], "temperature": 0.7 }'

{"detail": "Error processing prompt (see logs with DEBUG>=2): <AioRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNKNOWN\n\tdetails = "Unexpected <class 'RuntimeError'>: CUDA Error 2, out of memory"\n\tdebug_error_string = "UNKNOWN:Error received from peer {created_time:"2024-10-10T05:40:20.356329187+00:00", grpc_status:2, grpc_message:"Unexpected <class \'RuntimeError\'>: CUDA Error 2, out of memory"}"\n>"}

lipere123 avatar Oct 10 '24 05:10 lipere123

I'm also intermittently experiencing this, see: https://github.com/exo-explore/exo/issues/235.

fullofcaffeine avatar Oct 10 '24 17:10 fullofcaffeine

Seems like unable properly split model into chunks so model can be portionally loaded across several nodes. (llama 3.1 8B unable split across 3x8GB GPUs)

FFAMax avatar Oct 28 '24 03:10 FFAMax