hifi-gan
hifi-gan copied to clipboard
Multi GPU checkpoints
Hello,
first of all is this a nice paper.
Im having problems with your way of checkpointing. I was training this on a multi GPU server (5 GPUs) and looking into the code i saw that you save the model as module when using multi GPU. The problem im facing is that the checkpoint (for example g_01000000) is somehow shown as an zip archive (linux tells me that) and I cant figure out how to load it in.
Thanks for your help
I've faced this issue currently. The best way is to use previous checkpoint (eg: g_09500000
) and continue to train from this step, last checkpoint might failed to save completely (was corrupted).