Andrew Lapp

Results 205 comments of Andrew Lapp

Appreciate your review, fix, and interest @xuy. Will integrate that after I'm done with some bug fixes!

The code had a major bug - only single character tokens were being selected. I just pushed a fix which fixes the bug, makes the parser functional and immutable, caches...

Currently validating ~~25 tokens per second~~ ~~39 tokens per second~~ - ~~46 token per second with nproc = 2~~ - ~~67 tokens per second with nproc = 4~~ - ~~74...

@brucethemoose It's pretty poorly implemented, but here you go: https://github.com/lapp0/vllm/tree/grammar-multiprocessing I've been working on integrating some of my caching changes into https://github.com/outlines-dev/outlines which already has regex-based guidance for vLLM.

@viktor-ferenczi The parser doesn't handle ambiguous terminals well. Could you try converting them to a rule? Something along the lines of ``` signed_number: ["+"|"-"] number number: float | int float:...

> **Performance:** I was running vLLM with cProfile and executed the completion some 50 times in about 2 minutes. Found the grammar responsible for only ~550ms of CPU runtime, so...

> @jqueguiner The custom logits processors need some more information to be passed to avoid having to patch vLLM the hard way. Primary example is a way to identify the...

Sure @simon-mo will follow up with you for any changes to vLLM which are necessary. Thanks for your enthusiastic support! Closing in favor of outlines. A few changes necessary in...

In docker build, ``` 193.3 ERROR: Cannot install -r requirements.txt (line 8) and triton==2.0.0 because these package versions have conflicting dependencies. 193.3 The conflict is caused by: 193.3 The user...

Smoke tested CUDA and ExLlama kernels on A100. Saw a substantial memory reduction. Worked without problems.