Real-Time-Voice-Cloning
Real-Time-Voice-Cloning copied to clipboard
Not utilizing full GPU at training
I have RTX 6000, training its not using 100%, at only 25%-0% bouncing, VRAM around 5G out of the available 50.1G.
Any suggestions?
Server is provided by: https://lambdalabs.com/cloud/dashboard/instances
Something else is bottlenecking performance. Since it's a cloud server the first thing I would suspect is the filesystem. Make sure the dataset is hosted on some kind of fast, local storage like an SSD. If it's network-based storage, try copying it to /tmp
or /dev/shm
if it fits. To increase VRAM utilization you can increase batch size. This will usually have the effect of helping the model converge faster.
- Is higher batch size always better in this case? [i’ve saw higher batch size can lead to increased error rate in a paper, bigger isn’t always better?]
- It’s SSD based storage, still not much improvement.
I also trained the model on RTX A6000. I also find that the utilization only takes a little part during the training. And most of the time, the utilization is 0%. It seemed that there are other operations takes lots of time which may be the the bottleneck of the training process.