torchrec
torchrec copied to clipboard
[bugfix] Fix error on empty sharders
When passing a empty sharder list to DMP, the sharder_map will be a empty dict, which would return True for not sharder_map. In our experience, it's beneficial to support passing empty sharders to DMP to align the performance.
Why do you want to pass an empty sharder?
@xing-liu I found that when training with DMP on 1 gpu, the model will divergent within several steps, while the no DMP version convergent fine. Alowing to send empty sharder list would help me make sure if it is the problem of the ddp part or the sharder.
Can you share your model code?
@xing-liu I'm afraid I can't share the model... And also I've fixed the divergent problem yesterday, it was some misuse of torchrec...