Chengyuan Li

Results 6 comments of Chengyuan Li

You didn't install `torch` package, you can use Anaconda to manage your python package

> _3. Is there any way to monitor the highest GPU memory use during a time period? I only have a toy plan: collect multiple records during a time period...

> 1. Can you share your code for sending the HTTP request? Can you correctly run this example? https://github.com/sgl-project/sglang?tab=readme-ov-file#usage. The warning is unexpected. > 2. sglang outperforms vllm because of...

Thank you! Those benchmarks helped a lot. But I found an error when I ran [multi_turn_chat](https://github.com/sgl-project/sglang/tree/main/benchmark/multi_turn_chat) if I changed `args.turns` into 6 in bench_sglang.py. Here is the log: > Traceback...

My command is `python3 bench_sglang.py --tokenizer my_local_qwen_model_path`, and I just add args.turns = 6 before `main`function

+1 How to benchmark the speed up? I ran the example codes and didn't see obvious acceleration. How to reproduce 4.04x accelerate of Llama2-7b on A100?