Shuai Xie

Results 22 comments of Shuai Xie

I guess the problem may arise from the mechanism of **Argo**. Because kfp will add suffix for runs with the same name by default. By the way, is the suffix...

@Simardeep27 @FengLi-ust ME too. I Gusess OpenMMLab mmdeploy maybe a walk around as it supports DETR and DeformableDETR. And DINO is integrating now. https://github.com/open-mmlab/mmdetection/pull/9149

> KV Cache这部分,请 @ZhangJianwei0311 看看有没有comments 感谢回复@jklj077 🙏 麻烦 [@zhangjianwei033](https://github.com/ZhangJianwei0311) 抽空研究下,感谢了~

> The `version` in title means the version of PyTorch instead of PyTorchJob. Let's fix it on 1.8.0 and see how the difference is introduced. Oh yes. I'm sorry to...

Thanks for your kind reply @zw0610 @gaocegege. I'll fix the Pytorch version on 1.8.0 in the following experiments and look forward to figuring out this problem early with your help....

Maybe you can have a look at what I do in this issue https://github.com/kubeflow/pytorch-operator/issues/354#issue-999999536. Best wishes.

Hi, @johnugeorge, I also ran into the same problem as @SeibertronSS. I want to accelerate the training of pytorchjob to **achieve a comparable training speed performance like on bare metal**....

Yes, @gaocegege. Checkpoints can do this job. In this way, we have to define what and when to save. - what: users have to tell us what they want to...