Leo Zhao comments

Results 16 comments of


                                            Leo Zhao

CodeGen inference error "synNodeCreateWithId failed for node: batch_gemm with synStatus 26"

Gaudi2D, on this sku, MME FP32 is disabled.

[Bug]: inter-token latency is lower than TPOT in serving benchmark result

check code here: https://github.com/vllm-project/vllm/blob/a2469127db6144eedb38d0b505287c0044e4ce06/benchmarks/benchmark_serving.py#L271 the output len of TPOT calculation is based on tokenized len instead of real output token number from model, if the output is wrong, then output...

Qwen2-72B inference on 8x Gaudi2 gets OOM issue due to missing meta-device support on model loading

understood, we need both changes in optimum-habana and deepspeed-fork. I will try to submit JIRA through internal system.

Qwen2-72B inference on 8x Gaudi2 gets OOM issue due to missing meta-device support on model loading

https://github.com/huggingface/optimum-habana/pull/1151 fix for this issue, which also depends on new deepspeed version.

Qwen2-72B inference on 8x Gaudi2 gets OOM issue due to missing meta-device support on model loading

sure, will verify on latest 1.17

Qwen2-72B inference on 8x Gaudi2 gets OOM issue due to missing meta-device support on model loading

verified on 1.17, it is fixed.