Yang Sheng comments

Results 10 comments of


                                            Yang Sheng

[Performance]: 1P1D Disaggregation performance

Same experimental results. Don't know why.

[Performance]: 1P1D Disaggregation performance

I observed that the GPU utilization of the decode instances is very low.

[Usage]: Throughput and quality issue with vllm 0.6.0.

Benchmark is an art form, and the percentage improvement in throughput varies with different model sizes and graphics cards. vllm0.6.0 is optimized for scenarios with high throughput, particularly where CPU...

[Bug] Unable to fix model output

> You might be using the method incorrectly. We ensure consistency with transformers through unit tests and CI > > https://github.com/sgl-project/sglang/blob/c500f96bb16c686ee8ba5d5f1fc716a0bd8e5fff/test/srt/models/test_generation_models.py#L64-L130 max_diff tensor(0.0251) max_diff tensor(0.0225) max_diff tensor(0.0333) hf_outputs.output_strs=[' ________．（\u3000\u3000）\nA. London\nB....

[Bug] Unable to fix model output

max_diff tensor(0.0251) max_diff tensor(0.0225) max_diff tensor(0.0333) hf_outputs.output_strs=[' ________．（\u3000\u3000）\nA. London\nB. Paris\nC. Tokyo\nD. Beijing\n\n答案：A\n考查英文常识．根据', " to go out for a walk. I'm wearing my favorite pair of jeans, a white t-shirt, and...

[Bug] Unable to fix model output

> For the accuracy evaluation of SGLang, you can verify it using https://github.com/fw-ai/llm_eval_meta to **match the data in the official Llama 3.1 tech report**. Regarding the issue you mentioned about...

[Bug] Unable to fix model output

> @cherishhh It seems you didn't understand my previous reply. Currently, the eval scores of SGLang and the official scores of Llama 3.1 are **consistent**, there is **no issue with...

[Bug] Unable to fix model output

> > The output of gemma-2-2b from SRT is unstable on the commented prompt. > > Google's [Gemma-2 model](https://arxiv.org/abs/2408.00118) uses interleaved window attention to reduce computational complexity for long contexts,...

[Bug] Unable to fix model output

Not just Qwen, but testing Llama3 also showed this phenomenon. > I also noticed that for the qwen model, when the output length exceeds 32, using HFRunner and SRTRunner results...

[Bug] Unable to fix model output

> @cherishhh @Abdulhanan535 @tanmaylaud You can take a look at the implementation of https://github.com/fw-ai/llm_eval_meta/blob/main/analyze_answers.py. When evaluating, to determine if the model answer is correct, you can refer to https://github.com/fw-ai/llm_eval_meta/blob/b1166abf1395eafd3a994aefed5f6a420e697289/analyze_answers.py#L107-L119 **It...