zhaoyuac09 comments

Results 8 comments of


                                            zhaoyuac09

Bug for distributed wrapper regarding to cross batch memory loss

Thank you @KevinMusgrave. I would be happy to create a pull request later after I finish more testing cases here. If later I have succeeded all testing cases, I will...

distributed training bug fix, to maintain same order for mem queue fo…

> Thanks @zhaoyuac09! > > You mentioned you were testing this functionality. Could you add a file `tests/utils/test_distributed_xbm_queue.py` and paste in your testing code? Thank for checking on the issue!...

distributed training bug fix, to maintain same order for mem queue fo…

> > Thank for checking on the issue! I can add the testing code, but, it has to be tested on a distributed environment to see the memory queue for...

distributed training bug fix, to maintain same order for mem queue fo…

> If you can make `efficient=True` have gradients equivalent to the non-distributed version, that is even better! 👍 Just added the changes for both efficient xbm loss dist wrapper and...

distributed training bug fix, to maintain same order for mem queue fo…

> I think you're right 😆 > > My assumption must have been based on my [existing test](https://github.com/KevinMusgrave/pytorch-metric-learning/blob/master/tests/utils/test_distributed.py) where the distributed and non-distributed model parameters are nearly the same (though...

distributed training bug fix, to maintain same order for mem queue fo…

> Thanks for adding the `test_distributed_xbm_queue` test. Can you format it so that it's a `unittest` class? Here's a simple test file you can refer to: https://github.com/KevinMusgrave/pytorch-metric-learning/blob/master/tests/distances/test_collected_stats.py > > After...

distributed training bug fix, to maintain same order for mem queue fo…

> Thank you for your effort! Hello @KevinMusgrave , thank you for your patience. I have finished fixing the ```efficient=True``` for distributed regular loss. Even though I cannot reproduce the...

distributed training bug fix, to maintain same order for mem queue fo…

> distributed > Do you know if it's possible to test distributed training on CPU-only machines? It'd be nice to have the distributed tests run as part of the github...