hifi-gan icon indicating copy to clipboard operation
hifi-gan copied to clipboard

Multi GPU checkpoints

Open LaughingC0ffin opened this issue 2 years ago • 1 comments

Hello,

first of all is this a nice paper.

Im having problems with your way of checkpointing. I was training this on a multi GPU server (5 GPUs) and looking into the code i saw that you save the model as module when using multi GPU. The problem im facing is that the checkpoint (for example g_01000000) is somehow shown as an zip archive (linux tells me that) and I cant figure out how to load it in.

Thanks for your help

LaughingC0ffin avatar May 03 '22 12:05 LaughingC0ffin

I've faced this issue currently. The best way is to use previous checkpoint (eg: g_09500000) and continue to train from this step, last checkpoint might failed to save completely (was corrupted).

nampdn avatar Jun 05 '22 04:06 nampdn