SangBin Cho

Results 292 comments of SangBin Cho

I found when I don't specify this is returned ``` Thread 0x7FB1278F5740 (active): "MainThread" main_loop (ray/_private/worker.py:763) (ray/_private/workers/default_worker.py:233) Thread 860 (idle): "ray_import_thread" wait (threading.py:300) _wait_once (grpc/_common.py:106) wait (grpc/_common.py:148) result (grpc/_channel.py:735) _poll_locked...

Has this issue been resolved? I am observing this behavior from https://github.com/ray-project/ray/ when we run gpustat.new_query() repetitively at GCE. ![profile](https://github.com/wookayin/gpustat/assets/18510752/5db64f3b-b3a7-4e4a-ac6b-93e603159ac4)

Lots of time is spent on NvmlInit & shutdown & nvmlDeviceGetHandleByIndex

Duplicated as https://github.com/ray-project/ray/issues/29758

I believe if you implement tokenizer class that works with https://github.com/vllm-project/vllm/blob/3492859b687ba18db47720bcf6f07289999a2df5/vllm/transformers_utils/tokenizer_group/tokenizer_group.py#L42 this API, you can use https://github.com/vllm-project/vllm/blob/3492859b687ba18db47720bcf6f07289999a2df5/vllm/entrypoints/llm.py#L118 to set tokenizer.

Is this the same when you compare the same version vllm?

oh, I meant to use 0.2.7 for "2" (since you cannot upgrade openllm).