vllm
vllm copied to clipboard
Issue with raylet error
Hi, I'm using vllm to run llama-13B on two V100-16GB GPUs. I deployed vllm with the API server. However, When the context is long, the server returns:
[2023-08-09 22:39:16,002 E 209 223] (raylet) file_system_monitor.cc:111: /tmp/ray/session_2023-08-09_22-27-32_558284_37 is over 95% full, available space: 60427313152; capacity: 1599538507776. Object creation will fail if spilling is required.
and the model is stuck and cannot return anything. Is it because the GPU size is too small or is there any other approaches to resolve this issue? Thanks!
@ZihanWang314, Got the same warning, but the model is still running. It seems that the disk space is not enough, just use df -h
to check the disk space
可以把有空间的目录挂载到/tmp/ray; you could use "ln -s space_free_dir /tmp/ray"
I am curious because I met the same problem, it seems that the disk space of ray spilling continues to grow until out of disk error accurs.
I got same error. and "ln -s space_free_dir /tmp/ray" does not work for me
how to specify not using /tmp/ray ?
Does someone resolve the issue? I'm struggling with the same issue
Does someone resolve the issue? I'm struggling with the same issue
I had the issue when I'm using a docker container. I was able to circumvent the issue by mounting the empty directory to /tmp/ray. I hope this solution could help someone.
For example,
mkdir ./tmp_local
docker run -v ./tmp_local:/tmp/ray ...
clean up disk and keep it under 95% usage, that should fix the issue
Same error here, raylet taking up space in /tmp
.
Is there a way to tell raylet to use another folder to temporary objects directly from vLLM options?
This is a problem when using managed services that use a container to run the model for you such as Vertex AI or Sage maker since container is started with args the user has no control over so you can't mount /tmp/ over the host's volume to get more storage..
If you're running the model with a container, specify enough size for shared memory via --shm-size
arg
then, within your container:
ray.init(_temp_dir="/dev/shm/tmp_or_whatever", num_gpus=NUM_GPUS, ...)
clean up disk and keep it under 95% usage, that should fix the issue
This is the proper solution. When your disk space reaches to 100%, the things will anyway hang. I think it is likely huggingface cache is full of model weights downloaded (I've experienced this before).
Some potential solutions.
- clean
/tmp/ray
- clean other dirs that use high disk usage (likely hf cache)
- Use bigger volume
- https://docs.ray.io/en/master/ray-core/objects/object-spilling.html#cluster-mode -> Use a different spilling dir that has a higher disk size using this config.
If you cannot control your temp dir, one solution is to disable this feature. But please note that this can cause hang. You can do it by RAY_local_fs_capacity_threshold=1 when you start ray (via ray start
. I.e., RAY_local_fs_capacity_threshold=1 ray start ...
)
first: rm /tmp/ray -f then: mkdir a empty dir that use high disk usage, then: ln -s new_empty_dir /tmp/ray final: check the soft link is successful? df /tmp/ray
Does someone resolve the issue? I'm struggling with the same issue
I had the issue when I'm using a docker container. I was able to circumvent the issue by mounting the empty directory to /tmp/ray. I hope this solution could help someone.
For example,
mkdir ./tmp_local docker run -v ./tmp_local:/tmp/ray ...
I dont quite understand this thread.
what is the actual issue here?
I am facing this issue as well with a docker container using ray. Not actually using vllm. But the issue is the same I suppose. I have enough hard drive storage, but it somehow calculates the 'available space' wrong. so I read mounting a random empty folder from the host machine to the tmp folder helps there? why? how? does it make it slower?
I had the issue when I'm using a docker container. I was able to circumvent the issue by mounting the empty directory to /tmp/ray. I hope this solution could help someone.
For example,
mkdir ./tmp_local docker run -v ./tmp_local:/tmp/ray ...
I am in your shoes, but i dont understand the solution... why does this work.. how is the available space calculated? is temp limited in size? its so weird