IJCAI2023-OptimalShardedDataParallel
IJCAI2023-OptimalShardedDataParallel copied to clipboard
[IJCAI2023] An automated parallel training system that combines the advantages from both data and model parallelism. If you have any interests, please visit/star/fork https://github.com/Youhe-Jiang/Op...
In profiler_output.py file, there is a list named "time_map", The first dimension is about hidden_size, the second dimension is about batch size, the third dimension is a list containing 4...
Hello! I have obtained a ViT model from timm, and I want to train it using your OSDP method. However, OSDP requires torch version 1.10.2, while timm needs a higher...