Sequoia
Sequoia copied to clipboard

Published 20 hours ago •

→

Metadata

scalable and robust tree-based speculative decoding algorithm

Reame
Issues

Results 10 Sequoia issues

Sort by recently updated

Estimate the number of generated tokens per step from the acceptance-rate-vector?

1

comment

Hi, If I understand the tree_search algorithm right, the dynamic programming process should be able to find the optimal number of generated tokens according to the acceptance-rate-vector. Also, given the...

Reproducibility: the tree_search generates too small tree

8

comment

Hi, I was trying to reproduce the numbers in the paper, but with the `demo-config.json`, plus the acceptance vector in the repo or the acceptance vector I tested myself, the...

How to benchmark for speedup and acceptance rate?

7

comment

Sorry for asking a possibly obvious question but it would be better if the documentation makes this clear.

The support on vLLM?

2

comment

Hi, I remember the support on vLLM was on your TODOs. Have you achieved it now? Was the main challenge in this direction that the batch size > 1 tree...

paths fixed in tests/run_A100

Rotary fix

current code is not compatible with transformers 4.39 + because of changed rotary functions. Fix: copied these functions from transformers==4.37.2

Fix datasets

Fixing loading functions to save loading time and space. only the first files in each DS are needed. Addresses #4

data loading timing and disk use

The dataset loading code is taking too long. It downloads whole huge datasets (70G wiki, etc) to use just a handful of examples. setting `split="train[0:2000]")` is not helping since slicing...

Integration with Lit-GPT

2

comment

Hey @dreaming-panda, This looks really interesting. I wondered if you would be interested to show an integration with Lit-GPT: https://github.com/Lightning-AI/litgpt Best, T.C

Work On CPU

1

comment

Hi Sequoia team, Can this code framework fit in cpu devices? If so, how can we do it? Any insights? Regards

About

scalable and robust tree-based speculative decoding algorithm

inference

efficiency

llm

speculative-decoding

305

Stars

31

Forks

Watchers

Owner

← Metadata

305

Stars

31

Forks

Watchers

Owner

Metadata

scalable and robust tree-based speculative decoding algorithm