SangBin Cho
SangBin Cho
I found when I don't specify this is returned ``` Thread 0x7FB1278F5740 (active): "MainThread" main_loop (ray/_private/worker.py:763) (ray/_private/workers/default_worker.py:233) Thread 860 (idle): "ray_import_thread" wait (threading.py:300) _wait_once (grpc/_common.py:106) wait (grpc/_common.py:148) result (grpc/_channel.py:735) _poll_locked...
Has this issue been resolved? I am observing this behavior from https://github.com/ray-project/ray/ when we run gpustat.new_query() repetitively at GCE. 
Lots of time is spent on NvmlInit & shutdown & nvmlDeviceGetHandleByIndex
Duplicated as https://github.com/ray-project/ray/issues/29758
I believe if you implement tokenizer class that works with https://github.com/vllm-project/vllm/blob/3492859b687ba18db47720bcf6f07289999a2df5/vllm/transformers_utils/tokenizer_group/tokenizer_group.py#L42 this API, you can use https://github.com/vllm-project/vllm/blob/3492859b687ba18db47720bcf6f07289999a2df5/vllm/entrypoints/llm.py#L118 to set tokenizer.
Is this the same when you compare the same version vllm?
Can you try with lower version OSS vllm in this case?
oh, I meant to use 0.2.7 for "2" (since you cannot upgrade openllm).
I will take a look at it soon!
How can I reproduce this?