shyringo comments

Results 9 comments of


                                            shyringo

Are parameters "use_decomp" and "use_esd" meanningless for function "detect_anoms"?

I've also discovered this issue. Wondering if aynone is interested in solving it. It would only take a few judgements and lines.

Is there a way to terminate vllm.LLM and release the GPU memory

> I find that we need to explicitly run "del llm.llm_engine.driver_worker" to release in when using a single worker. Can anybody explain why this is the case? I tried the...

Is there a way to terminate vllm.LLM and release the GPU memory

> I tried the above code block and also this line "del llm.llm_engine.driver_worker". Both failed for me. > > But I managed, with the following code, to terminate the vllm.LLM(),...

Is there a way to terminate vllm.LLM and release the GPU memory

> Tried this including `ray.shutdown()` but the memory is not released on my end, any other suggestion? could try the "del llm.llm_engine.model_executor" in the following code instead: > update: the...

Is there a way to terminate vllm.LLM and release the GPU memory

> did that as well, still no change in gpu memory allocation. Not sure how to go further Then I do not have a clue either. Meanwhile, I should add...

vLLM stops all processing when CPU KV cache is used, has to be shut down and restarted.

> > this issue makes vllm impossible for production use > > At present, we have found a workaround and set the swap space directly to 0. This way, we...

GPU KV cache usage: 100.0%以后就卡住

Met the same issue in Offline Batched Inference. Wouldn't continue when stuck in the line `LLM()`. GPU memory usage was occupied, but GPU utilization was 0%.

vllm hangs when reinitializing ray

#1908 might be related, but in 'Offline Batched Inference' mode.