Sequoia
Sequoia copied to clipboard
scalable and robust tree-based speculative decoding algorithm
Hi, If I understand the tree_search algorithm right, the dynamic programming process should be able to find the optimal number of generated tokens according to the acceptance-rate-vector. Also, given the...
Hi, I was trying to reproduce the numbers in the paper, but with the `demo-config.json`, plus the acceptance vector in the repo or the acceptance vector I tested myself, the...
Sorry for asking a possibly obvious question but it would be better if the documentation makes this clear.
Hi, I remember the support on vLLM was on your TODOs. Have you achieved it now? Was the main challenge in this direction that the batch size > 1 tree...
current code is not compatible with transformers 4.39 + because of changed rotary functions. Fix: copied these functions from transformers==4.37.2
Fixing loading functions to save loading time and space. only the first files in each DS are needed. Addresses #4
The dataset loading code is taking too long. It downloads whole huge datasets (70G wiki, etc) to use just a handful of examples. setting `split="train[0:2000]")` is not helping since slicing...
Hey @dreaming-panda, This looks really interesting. I wondered if you would be interested to show an integration with Lit-GPT: https://github.com/Lightning-AI/litgpt Best, T.C
Hi Sequoia team, Can this code framework fit in cpu devices? If so, how can we do it? Any insights? Regards