Yixin Dong comments

Results 44 comments of


                                            Yixin Dong

[Bug] Token IDs not accepted by JSON grammar

Thanks for the report @dtkettler Currently there are several issues with llama3 because it changes the tokenizer a lot. That will be fixed soon in these days

[Bug] Qwen2-1.5B Q4F16_0 - libc++abi: terminating due to uncaught exception of type std::length_error: vector

Hi @digisomni, thanks for reporting the error! Could you provide the complete error message and the script to reproduce the error so we can better identify the problem? I failed...

[Model] Add support for Aya-23 8B Model by Cohere

@GunjanDhanuka The tokenizer issue is solved in this PR: #2649. Please tell me if there are any other related problems!

[Bug] chatglm4 mlc_llm shows error "TVMError: Check failed: append_length > 0 (0 vs. 0) : Append with length 0 is not allowed." during mlc_llm chat CLI

That is related to a recent change in tokenizer in #2416. We will fix that soon

[Bug] chatglm4 mlc_llm shows error "TVMError: Check failed: append_length > 0 (0 vs. 0) : Append with length 0 is not allowed." during mlc_llm chat CLI

See #2532

feat: Support Spec V2 + Constrained Decoding

> Thanks, I test your impl use Llama3.1-8b-Instruct and [eagle model](https://huggingface.co/yuhuili/EAGLE-LLaMA3.1-Instruct-8B). When set `export SGLANG_ENABLE_SPEC_V2=0`, the response satisfies `r"^user@example\.com$"` . When set `export SGLANG_ENABLE_SPEC_V2=1`, the response is `use the following...

Yixin Dong

[Bug] Token IDs not accepted by JSON grammar

[Bug] Qwen2-1.5B Q4F16_0 - libc++abi: terminating due to uncaught exception of type std::length_error: vector

[Model] Add support for Aya-23 8B Model by Cohere

[Bug] chatglm4 mlc_llm shows error "TVMError: Check failed: append_length > 0 (0 vs. 0) : Append with length 0 is not allowed." during mlc_llm chat CLI

[Bug] chatglm4 mlc_llm shows error "TVMError: Check failed: append_length > 0 (0 vs. 0) : Append with length 0 is not allowed." during mlc_llm chat CLI

feat: Support Spec V2 + Constrained Decoding

feat: Support Spec V2 + Constrained Decoding

feat: Support Spec V2 + Constrained Decoding

Can we add a benchmark on end-to-end evaluation to show the time breakdown?

Can we add a benchmark on end-to-end evaluation to show the time breakdown?