metacryptom
metacryptom

Token indices sequence length is longer than the specified maximum sequence length for this model (2620 > 2048). Running this sequence through the model will result in indexing errors INFO...
And this also make the server resource leak
try: async for request_output in results_generator: if await request.is_disconnected(): The await request.is_disconnected is never excueted if something error happed(maybe length over max) ,so the request never quitted which cause the...
[#Issue 320 ](https://github.com/vllm-project/vllm/issues/320)
Not just the case the input is too long, when the request can't be executed and added to swap queue ,the new coming request can't be executed either . I...