Results 7 comments of Jacob Schein

Running into the following issues (surfaced via `tp_worker.py`) when trying to query Llama 3.1 405B FP8 on an 8xH100 while setting `tensor_parallel_size=8`. _Note: requests to Llama 3.1 8B Instruct are...

@JianyuZhan unfortunately still running into errors after cleaning up the typos ``` llm = LLM(model="meta-llama/Meta-Llama-3.1-405B-Instruct-FP8", tensor_parallel_size=8) >>> prompts = ["Hi my name is"] >>> res=llm.generate(prompts) [17:53:56 TP0] Prefill batch. #new-seq:...

https://github.com/JianyuZhan/sglang/pull/1 — @JianyuZhan this compiles / addresses the typo

@JianyuZhan @zhyncs is this close to being merged? Would love to start using

> May you try `python3 -m sglang.bench_serving --backend sglang --num-prompts 1024` instead? I tried running this on a similar configuration (2x8xH100). The requests hang and my server crashes.

``` python -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3 --tp 16 --dist-init-addr 10.233.88.177:20000 --nnodes 2 --node-rank 0 --trust-remote-code --host 0.0.0.0 --port 8001 ``` What I am seeing (on node 2) ``` [2025-01-06 17:48:40...

Note: I was able to run the benchmark by adding additional arguments [found in this comment](https://github.com/sgl-project/sglang/issues/2741#issuecomment-2572632936): ``` python -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3 --tp 16 --dist-init-addr 10.233.88.177:20000 --nnodes 2 --node-rank 0...