Tensorboard logs seem to fail when multi-gpu training
When I train the model with a single GPU on my local machine, the tensorboard goes well and is able to show loss curve and so on. However, when I train the model with multiple GPUs, the tensorboard seems not to record anything (the size of tensorboard files are very small, ~KB). I wonder if this is a multi-gpu training bug, thank you.
It's quite strange because we didn't seem to encounter this issue during multi-GPU training. We will attempt to reproduce it later. We recommend you switch to cogkit for training first, as we have now shifted our maintenance of cogvideo training to cogkit, which offers better training efficiency and usability.
It's quite strange because we didn't seem to encounter this issue during multi-GPU training. We will attempt to reproduce it later. We recommend you switch to cogkit for training first, as we have now shifted our maintenance of cogvideo training to cogkit, which offers better training efficiency and usability.
Thank you for your valuable guidance! I will try the cogkit!