TTS icon indicating copy to clipboard operation
TTS copied to clipboard

[Feature request] VRAM calculated + precaution warning message

Open TrycsPublic opened this issue 2 years ago β€’ 2 comments

Calculate the amount of vram[GPU memory] will be used in the first stage of trainer.fit

Based on the batch size, dataset, and other variables. Sometimes it takes hours to find out the full config best the the current hardware, like what happened after i left it training overnight, after 35000 steps it run out of ram without any warning or saving the model before crashing leading to over $40 wasted.

Solution

Give a warning when the current config might overflow the available vram and is going to throw a error, which would prevented wasting costly $$$ on idle servers during midnight. Or, preallocate all the vram going to be used by the AI.

TABLE

If anyone have a good table on batch size vs vram please send it here.

TrycsPublic avatar Aug 15 '22 20:08 TrycsPublic

There is a on going work here but it is not easy at all

erogol avatar Aug 22 '22 07:08 erogol

Calculate the amount of vram[GPU memory] will be used in the first stage of trainer.fit

Based on the batch size, dataset, and other variables. Sometimes it takes hours to find out the full config best the the current hardware, like what happened after i left it training overnight, after 35000 steps it run out of ram without any warning or saving the model before crashing leading to over $40 wasted.

Solution

Give a warning when the current config might overflow the available vram and is going to throw a error, which would prevented wasting costly $$$ on idle servers during midnight. Or, preallocate all the vram going to be used by the AI.

TABLE

If anyone have a good table on batch size vs vram please send it here.

there is a branch I'm working on in our trainer repo that will auto calculate the max batch size for you so you dont have to worry about it here: https://github.com/coqui-ai/Trainer/tree/largest_batch_size_finder currently it only just does training finding the max batch size possible but giving a warning for people trying to use different batch sizes is a good idea

loganhart02 avatar Sep 09 '22 06:09 loganhart02

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

stale[bot] avatar Oct 09 '22 06:10 stale[bot]