metacryptom comments

Results 6 comments of


                                            metacryptom

The async_llm_engine may have resource leak when using stream

![image](https://github.com/vllm-project/vllm/assets/98044045/8e4dab93-e517-4294-abcd-31445be8aab7)

The async_llm_engine may have resource leak when using stream

Token indices sequence length is longer than the specified maximum sequence length for this model (2620 > 2048). Running this sequence through the model will result in indexing errors INFO...

The async_llm_engine may have resource leak when using stream

And this also make the server resource leak

Fix #issues/320

try: async for request_output in results_generator: if await request.is_disconnected(): The await request.is_disconnected is never excueted if something error happed(maybe length over max) ,so the request never quitted which cause the...

Fix #issues/320

[#Issue 320 ](https://github.com/vllm-project/vllm/issues/320)

Fix #issues/320

Not just the case the input is too long, when the request can't be executed and added to swap queue ,the new coming request can't be executed either . I...