Aaron Pham
Aaron Pham
You can pass in `stop` argument on request for the token to be stopped.
not sure if I understand this, but if the client disconnects, with vLLM backend the request will be cancelled.
Currently, I have a job to build the binary for MUSL python wheel and seems like it fails as well. 🤔
This has to do with SSE support on BentoML. There is a feature roadmap currently working on in the community discord.
This is now finished and supported
What are the logs?
You need to pass `--gpus all` to enable GPU on the container.
I will take a look into this once I'm available next week. We have a logic to determine the number of GPUS here https://github.com/bentoml/OpenLLM/blob/8d989767e838972fe10e02d78bf640904560b85e/openllm-python/src/openllm/_runners.py#L104
Hi there, can you record a video showcasing this "bug"? edit: I think I understand what you meant here.
https://github.com/jackyzha0/quartz/assets/29749331/714dcb28-f708-4841-aa79-3e05d89040df The mouse behaviour after scrolling in