Real-Time-Voice-Cloning Not utilizing full GPU at training

Not utilizing full GPU at training

Open TrycsPublic opened this issue 2 years ago • 2 comments

I have RTX 6000, training its not using 100%, at only 25%-0% bouncing, VRAM around 5G out of the available 50.1G.

Any suggestions?

Server is provided by: https://lambdalabs.com/cloud/dashboard/instances

May 25 '22 23:05 TrycsPublic

Something else is bottlenecking performance. Since it's a cloud server the first thing I would suspect is the filesystem. Make sure the dataset is hosted on some kind of fast, local storage like an SSD. If it's network-based storage, try copying it to /tmp or /dev/shm if it fits. To increase VRAM utilization you can increase batch size. This will usually have the effect of helping the model converge faster.

May 26 '22 07:05 raccoonML

Is higher batch size always better in this case? [i’ve saw higher batch size can lead to increased error rate in a paper, bigger isn’t always better?]
It’s SSD based storage, still not much improvement.

May 29 '22 20:05 TrycsPublic

I also trained the model on RTX A6000. I also find that the utilization only takes a little part during the training. And most of the time, the utilization is 0%. It seemed that there are other operations takes lots of time which may be the the bottleneck of the training process.

Dec 01 '22 12:12 huskyachao

Real-Time-Voice-Cloning Real-Time-Voice-Cloning copied to clipboard

Not utilizing full GPU at training

Real-Time-Voice-Cloning
Real-Time-Voice-Cloning copied to clipboard