Yang Sheng
Yang Sheng
Same experimental results. Don't know why.
I observed that the GPU utilization of the decode instances is very low.
Benchmark is an art form, and the percentage improvement in throughput varies with different model sizes and graphics cards. vllm0.6.0 is optimized for scenarios with high throughput, particularly where CPU...
> You might be using the method incorrectly. We ensure consistency with transformers through unit tests and CI > > https://github.com/sgl-project/sglang/blob/c500f96bb16c686ee8ba5d5f1fc716a0bd8e5fff/test/srt/models/test_generation_models.py#L64-L130 max_diff tensor(0.0251) max_diff tensor(0.0225) max_diff tensor(0.0333) hf_outputs.output_strs=[' ________.(\u3000\u3000)\nA. London\nB....
max_diff tensor(0.0251) max_diff tensor(0.0225) max_diff tensor(0.0333) hf_outputs.output_strs=[' ________.(\u3000\u3000)\nA. London\nB. Paris\nC. Tokyo\nD. Beijing\n\n答案:A\n考查英文常识.根据', " to go out for a walk. I'm wearing my favorite pair of jeans, a white t-shirt, and...
> For the accuracy evaluation of SGLang, you can verify it using https://github.com/fw-ai/llm_eval_meta to **match the data in the official Llama 3.1 tech report**. Regarding the issue you mentioned about...
> @cherishhh It seems you didn't understand my previous reply. Currently, the eval scores of SGLang and the official scores of Llama 3.1 are **consistent**, there is **no issue with...
> > The output of gemma-2-2b from SRT is unstable on the commented prompt. > > Google's [Gemma-2 model](https://arxiv.org/abs/2408.00118) uses interleaved window attention to reduce computational complexity for long contexts,...
Not just Qwen, but testing Llama3 also showed this phenomenon. > I also noticed that for the qwen model, when the output length exceeds 32, using HFRunner and SRTRunner results...
> @cherishhh @Abdulhanan535 @tanmaylaud You can take a look at the implementation of https://github.com/fw-ai/llm_eval_meta/blob/main/analyze_answers.py. When evaluating, to determine if the model answer is correct, you can refer to https://github.com/fw-ai/llm_eval_meta/blob/b1166abf1395eafd3a994aefed5f6a420e697289/analyze_answers.py#L107-L119 **It...