Andrew Lapp comments

Results 222 comments of


                                            Andrew Lapp

Improved initial Outlines experience

Good issue, I've run into all of these problems. I disagree about `llama.cpp` though, there's no reason to by default include `llama-cpp-python` in downstream dependents such as vLLM. Additionally, `llama-cpp-python`...

Improved initial Outlines experience

@ahmed-moubtahij Yes, outlines only becomes the bottleneck after ~1,000 tokens/s, and `vllm` is substantially faster than `transformers` However, are you sure you set `device_map="cuda"`? Sounds like you might have been...

Would be good to have some benchmarks against these as well like tokens/s

Good idea, this is the #1 metric people care about. - vLLM benchmark script: https://github.com/vllm-project/vllm/tree/main/benchmarks - TensorRT-LLM: https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/performance.md#benchmarking-per-model - llama.cpp: https://github.com/ggerganov/llama.cpp/blob/master/examples/llama-bench/README.md - TGI: https://github.com/huggingface/text-generation-inference/blob/main/benchmark/README.md

Outlines doesn't install on Collab because Rust isn't available to compile outlines core

Can you please try `pip install git+https://github.com/lapp0/outlines@add-fsm-union-pin-core` and report back whether it works? This is the branch of a PR in progress which fixes the rust installation issue.

Outlines doesn't install on Collab because Rust isn't available to compile outlines core

Thanks so much for helping me test it! I expect a new release soon with the mentioned branch included.

Fix MultinomialSampler hyperparameter bug

I assume this was with ExLlamaV2, or am I wrong? Good find.

`generate.regex()` fails to generate regex-constrained text with madlad400-3b-mt T5 model

I ran your reproduction script, thanks for informing us about this issue. Here are some samples of tokens (in byte format) which cause the Error in your model: - `\xef\xbf\xbd\xe2\x80\x9e`...

JSON

We could update `outlines/fsm/json_schema.py` to allow arbitrary order, however this would increase the complexity (and compilation time) of the FSM exponentially. Once we have CFG working this will be viable....

Comparison for other frameworks as well

Not familiar with any of these other than FlexFlow unfortunately. Happy to include PRs for any of these if they are uniquely valuable inference engines.

Open a High-Score (HS) Track

While this doesn't fully implement your suggestion, you gave me the idea to make a sample-efficient track. https://github.com/lapp0/sample-efficient-nanogpt