SangBin Cho
SangBin Cho
+1 on this. It worked very well with ray, especially when users have a bug that is fixed in the master.
I may have time to look at it after https://github.com/vllm-project/vllm/pull/3631
cc @matthewdeng what's the best way to debug object store memory usage for xgboost on ray? @showkeyjar I think your workload has high object store usage which triggers spilling https://docs.ray.io/en/master/ray-core/objects/object-spilling.html....
Based on your output ^, it looks like spilling actually doesn't really happen. I guess most of disk usage is from ray logs?
Is it correct the disk usage is mostly from `/tmp/ray/session_latest/logs/`?
QQ: is this PR still active?
Btw, it also seems like vllm flash attn is built with glib 2.32, and is not compatible with ubuntu 20.04 (which has glib 2.31 by default), which I believe is...
I see. that makes a lot of sense! I think vendoring in the current way should be fine! Besides, if the wheel is available at 20.04, we can also run...
Lmk when this is not WIP anymore!
Sycned offline. We will start with 1 file after job is assigned to workers. After that, I will follow up if there's a solution not to lose logs before jobs...