Aditya Kamat

Results 9 comments of Aditya Kamat

Hi! I'd like to work on this issue. I understand you need metadata-based search/filtering in the knowledge base documents list, beyond the current name-only search. This would help manage large...

> Out of curiosity, does the same happen with a much lower limit, like 10? Yes, it even happens with limit=2.

I have experience with Spec, can I look into it?

Hey @hnyls2002 Did some digging. I think the OOM is from kv_allocated_len not getting updated in the paged allocation paths. In eagle_info.py's prepare_for_verify(), the page_size == 1 branch updates kv_allocated_len...

> In the comparison here, it seems that the performance of ngram is not yet higher than that of "no spec". Could you provide the specific startup parameters and pressure...

> > > In the comparison here, it seems that the performance of ngram is not yet higher than that of "no spec". Could you provide the specific startup parameters...

@hnyls2002 Hey, can you review this once?

> Feeding this source code and the vLLM [implementation](https://github.com/vllm-project/vllm/blob/da3222f371b48c8e2548ec22767523394580a1c5/vllm/v1/spec_decode/suffix_decoding.py#L4) to Sonnet and asking it a few times, it thinks the main difference is the variable speculation lengths. vLLM produces variable...