Leo Zhao
Leo Zhao
Gaudi2D, on this sku, MME FP32 is disabled.
check code here: https://github.com/vllm-project/vllm/blob/a2469127db6144eedb38d0b505287c0044e4ce06/benchmarks/benchmark_serving.py#L271 the output len of TPOT calculation is based on tokenized len instead of real output token number from model, if the output is wrong, then output...
understood, we need both changes in optimum-habana and deepspeed-fork. I will try to submit JIRA through internal system.
https://github.com/huggingface/optimum-habana/pull/1151 fix for this issue, which also depends on new deepspeed version.
sure, will verify on latest 1.17
verified on 1.17, it is fixed.