infinity
infinity copied to clipboard
How to limit memory usage?
When I run
port=3000
model1=Salesforce/SFR-Embedding-Code-2B_R
volume=$PWD/data
docker run -it --gpus device=0 \
-v $volume:/app/.cache \
-p $port:$port \
michaelf34/infinity:latest \
v2 \
--model-id $model1 \
--port $port \
--model-warmup \
--batch-size 4
I got this warning
accelerate.utils.modeling INFO: We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
How can I set max_memory for e.g. using only 30% of GPU 0?