Iman Tabrizian
Iman Tabrizian
/bot run --only-multi-gpu-test --disable-fail-fast
/bot run --only-multi-gpu-test --disable-fail-fast
/bot run --stage-list "DGX_H100-4_GPUs-PyTorch-Others-1,DGX_H200-8_GPUs-PyTorch-[Post-Merge]"
/bot run --stage-list "DGX_H100-4_GPUs-PyTorch-Others-1,DGX_H200-8_GPUs-PyTorch-[Post-Merge]" --disable-fail-fast
/bot reuse-pipeline
/bot run --stage-list "DGX_H200-8_GPUs-PyTorch-[Post-Merge]"
/bot reuse-pipeline
@GuanLuo I think the `response_iterator` was relying on a queue inside the request object which might be the root cause for this issue.
@nsealati It looks like the number of tokens is larger than expected, could you please double check that the client is sending exactly `3500` tokens to the server. It looks...
Closing since this feature has been completed.