Wenxuan Tan
Wenxuan Tan
Thanks @yuanheng-zhao There seems to be a duplicate issue #5673 even with the newest version, though we haven't been developing the auto-parallel API for a while😂
This is a common error due to your systems' global Nvidia driver (12.2) and pytorch cuda (12.1) version mismatch. You should comment out this
Try reinstalling your system nv driver to the same version?
Did you count the classification head?
Oh sry. I also wonder if you're able to reproduce any of the experiment results?
Many thanks! I was suspecting multiplying multiple butterfly matrices may be speed and memory deficient due to storing activations. I also wonder if this is your ongoing work (Trying to...
If I remember correctly, in pytorch it's not straightforward to enable gradient for half of a matrix and disable it for the other half
still not solved, any help plz?
> The error you mentioned earlier, torch.distributed.elastic.multiprocessing.errors.ChildFailedError, typically occurs when one of the child processes launched by torchrun encounters an error and fails to execute properly. It is difficult to...
> The checkpoints are hosted on GitHub. Which one are you having trouble downloading? None of the checkpoints can be downloaded