REST icon indicating copy to clipboard operation
REST copied to clipboard

REST: Retrieval-Based Speculative Decoding, NAACL 2024

Results 15 REST issues
Sort by recently updated
recently updated
newest added

I want to use tensor parallel with REST, but I do not find the config to start the tensor parallel, can you give me an example?

I followed your suggestion to modify retriever to adapt to Qwen2.5 When I run function "search". The return sequence seems too short. Though I change the parameter "long", It doesn't...

when I initialize draftretriever.Reader, I meet this error. python3 gen_model_answer_rest.py loading the datastore ... Traceback (most recent call last): File "/mnt/gefei/REST/llm_judge/gen_model_answer_rest.py", line 493, in run_eval( File "/mnt/gefei/REST/llm_judge/gen_model_answer_rest.py", line 135, in...

I am trying to adjust draft tokens by length, explicitly aiming to retrieve tokens with a size greater than four by changing the “long” parameter when calling the search function....

When I run: > RAYON_NUM_THREADS=6 CUDA_VISIBLE_DEVICES=0 python3 -m rest.inference.cli --datastore-path datastore/datastore_chat_small.idx --base-model meta-llama/Meta-Llama-3-8B-Instruct I get: > RAYON_NUM_THREADS=6 CUDA_VISIBLE_DEVICES=0 python3 -m rest.inference.cli --datastore-path datastore/datastore_chat_small.idx --base-model meta-llama/Meta-Llama-3-8B-Instruct Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4...

I adapted the REST solution to the Qwen-7b model and encountered the following problem when testing it on the Human-eval data set: What may be the cause of this problem...

If I want to change REST code to support multi-batch inference, what needs to be changed?

I need to connect the REST scheme with the Transformers library. How can I support repetition_penalty parameter through the logits_processor interface? https://github.com/huggingface/transformers/blob/481a95781404e48b1c80940be17e8279dec82fe8/src/transformers/generation/utils.py#L1735-L1745 By the way, how to make REST support...

``` Suffix array initialized with length: 3055148 Calling libsais_int with parameters: buffer.as_ptr(): 0x75fee1dff010 suffix_array.as_mut_ptr(): 0x6967ee0 buffer.len() as i32: 3055148 vocab_size: 100000 symbol_frequency_table: 0 Segmentation fault (core dumped) ``` Justing the...

https://github.com/FasterDecoding/REST/blob/6aed6ad5beb11849adfe671e874c239461ee8b84/DraftRetriever/src/lib.rs#L209-L215 Sorry, I am not familiar with rust. What are the meanings of each parameter of the reader.search() function? I'm very much looking forward to your reply. Thanks.