Aaron Pham comments

Results 403 comments of


                                            Aaron Pham

how to stop generation stream?

You can pass in `stop` argument on request for the token to be stopped.

how to stop generation stream?

not sure if I understand this, but if the client disconnects, with vLLM backend the request will be cancelled.

Fix MUSL Linux build

Currently, I have a job to build the binary for MUSL python wheel and seems like it fails as well. 🤔

feat: Stream completions

This has to do with SSE support on BentoML. There is a feature roadmap currently working on in the community discord.

/v1/chat/completions endpoint not responding - ValueError: The number of required GPUs exceeds the total number of available GPUs in the cluster.

What are the logs?

/v1/chat/completions endpoint not responding - ValueError: The number of required GPUs exceeds the total number of available GPUs in the cluster.

You need to pass `--gpus all` to enable GPU on the container.

/v1/chat/completions endpoint not responding - ValueError: The number of required GPUs exceeds the total number of available GPUs in the cluster.

I will take a look into this once I'm available next week. We have a logic to determine the number of GPUS here https://github.com/bentoml/OpenLLM/blob/8d989767e838972fe10e02d78bf640904560b85e/openllm-python/src/openllm/_runners.py#L104

Graph zoom-in/out moves objects in an offset manner

Hi there, can you record a video showcasing this "bug"? edit: I think I understand what you meant here.

Graph zoom-in/out moves objects in an offset manner

https://github.com/jackyzha0/quartz/assets/29749331/714dcb28-f708-4841-aa79-3e05d89040df The mouse behaviour after scrolling in