Iman Tabrizian

Results 171 comments of Iman Tabrizian

/bot run --only-multi-gpu-test --disable-fail-fast

/bot run --only-multi-gpu-test --disable-fail-fast

/bot run --stage-list "DGX_H100-4_GPUs-PyTorch-Others-1,DGX_H200-8_GPUs-PyTorch-[Post-Merge]"

/bot run --stage-list "DGX_H100-4_GPUs-PyTorch-Others-1,DGX_H200-8_GPUs-PyTorch-[Post-Merge]" --disable-fail-fast

/bot run --stage-list "DGX_H200-8_GPUs-PyTorch-[Post-Merge]"

@GuanLuo I think the `response_iterator` was relying on a queue inside the request object which might be the root cause for this issue.

@nsealati It looks like the number of tokens is larger than expected, could you please double check that the client is sending exactly `3500` tokens to the server. It looks...

Closing since this feature has been completed.