Aaron Pham

Results 403 comments of Aaron Pham

You can pass in `stop` argument on request for the token to be stopped.

not sure if I understand this, but if the client disconnects, with vLLM backend the request will be cancelled.

Currently, I have a job to build the binary for MUSL python wheel and seems like it fails as well. 🤔

This has to do with SSE support on BentoML. There is a feature roadmap currently working on in the community discord.

This is now finished and supported

I will take a look into this once I'm available next week. We have a logic to determine the number of GPUS here https://github.com/bentoml/OpenLLM/blob/8d989767e838972fe10e02d78bf640904560b85e/openllm-python/src/openllm/_runners.py#L104

Hi there, can you record a video showcasing this "bug"? edit: I think I understand what you meant here.

https://github.com/jackyzha0/quartz/assets/29749331/714dcb28-f708-4841-aa79-3e05d89040df The mouse behaviour after scrolling in