fast-stable-diffusion icon indicating copy to clipboard operation
fast-stable-diffusion copied to clipboard

Run Google Colab Dreambooth training on Vast.ai or Runpod.io

Open bach777 opened this issue 2 years ago • 24 comments

Due to the limits of google colab free tier, it is not viable the training in resolutions higher than +768x768, or even in 512x512, I have understood that it requires many steps to obtain impeccable results, so I was wondering if there is a way to use your google colab template in a gpu rental page, or to do it in a local Colab session but connected to a rented gpu. Is it possible? Thank you very much for your kind assistance!

bach777 avatar Nov 16 '22 02:11 bach777

Yes. AItrepreneur has plenty of Runpod tutorials. https://www.youtube.com/c/Aitrepreneur/videos

GuruVirus avatar Nov 16 '22 03:11 GuruVirus

Runpod costs an arm and a leg, and that's if a decent client is available. Google colab costs $10 a month.

wktra avatar Nov 16 '22 03:11 wktra

the price of a RTX A5000 is $0.49/hour on demand

bach777 avatar Nov 16 '22 03:11 bach777

Each training takes me about 4 hours for actual quality. That's about $2 right there. And then there are the disk charges ($4/mo Disk Charge). I train and tweak a few times a day. In less than a week, I'll be paying out the nose.

wktra avatar Nov 16 '22 03:11 wktra

Thank you very much for the info, I will try to do it on Google Colab with the monthly tier. I hope they accept payments from my country

bach777 avatar Nov 16 '22 04:11 bach777

Thank you very much for the info, I will try to do it on Google Colab with the monthly tier. I hope they accept payments from my country

I gotta be honest, I had trouble with my USA card when I tried to subscribe. I had to call my bank, tell them to not reject google colab and then try again.

wktra avatar Nov 16 '22 04:11 wktra

I have trained a model (3000 steps) with "Continue training" , checked, and "Enable_text_encoder_training:" unchecked, two models named A1_step_1000.ckpt and A1_step_2000.ckpt are on my hard disk, It takes steps 4000-5000 steps to complete the training, right? . When I try to load the session, I get the message "Previous model not found, training a new model...". And it starts the training from the beginning.... What should I do?

bach777 avatar Nov 16 '22 04:11 bach777

Just a warning, an acquaintance tested colab premium ($50) and Dreambooth would not run on the the provided hardware.

As far as restarting a session, is your session name the same? image

100 steps per image worked well with SD1.4, but 1.5 I haven't heard of people having good results worth reproducing.

GuruVirus avatar Nov 16 '22 04:11 GuruVirus

Screenshot (380) Screenshot (381) I referred as A1 as an example, this is the model that I try to train, apparently everything is correct but the training is restarting from 0

bach777 avatar Nov 16 '22 04:11 bach777

The training is restarting because there is no final ckpt, only the intermediary checkpoint, if you want to resume anyway, rename one of the models to "sora.ckpt"

TheLastBen avatar Nov 16 '22 05:11 TheLastBen

I'm having problem running it on Paperspace, it's just stuck at cloning the repo

emidio90 avatar Nov 19 '22 10:11 emidio90

See also:

  • https://github.com/TheLastBen/fast-stable-diffusion/issues/80
  • https://github.com/TheLastBen/fast-stable-diffusion/pull/150
  • https://github.com/TheLastBen/fast-stable-diffusion/issues/448
  • https://github.com/TheLastBen/fast-stable-diffusion/issues/704

0xdevalias avatar Dec 02 '22 23:12 0xdevalias

idk if I'm too late but check this out https://github.com/SU1199/fastBooth It's has all the performance modifies from the shivam's and theleastben notebooks with xformers.

SU1199 avatar Jan 10 '23 10:01 SU1199

I tried running ShivamShivrao and ThelastBen on runpod and vast ai. Training in working fine but the model was not able to generate user given images after training. It is working fine in colab. What might be the reason. It would be a great help if anyone coult help me with this. Thank You @SU1199

Shadhil24 avatar Mar 28 '23 05:03 Shadhil24

@Shadhil24 I made a template for Runpod https://www.runpod.io/console/gpu-secure-cloud?template=runpod-stable-unified

TheLastBen avatar Mar 28 '23 06:03 TheLastBen

Can i run this on vast ai, sorry if this is a stupid question, i am new to this

On Tue, Mar 28, 2023 at 12:05 PM Ben @.***> wrote:

@Shadhil24 https://github.com/Shadhil24 I made a template for Runpod https://www.runpod.io/console/gpu-secure-cloud?template=runpod-stable-unified

— Reply to this email directly, view it on GitHub https://github.com/TheLastBen/fast-stable-diffusion/issues/518#issuecomment-1486293354, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOZN5GBGYUZE2LRL5PM2IDTW6KBEBANCNFSM6AAAAAASBWDP6U . You are receiving this because you were mentioned.Message ID: @.***>

Shadhil24 avatar Mar 28 '23 07:03 Shadhil24

The template is designed for runpod

TheLastBen avatar Mar 28 '23 09:03 TheLastBen

Is it necessary to given name for the images as same as the instance images

On Tue, Mar 28, 2023 at 12:34 PM Shadhil Siraj < @.***> wrote:

Can i run this on vast ai, sorry if this is a stupid question, i am new to this

On Tue, Mar 28, 2023 at 12:05 PM Ben @.***> wrote:

@Shadhil24 https://github.com/Shadhil24 I made a template for Runpod https://www.runpod.io/console/gpu-secure-cloud?template=runpod-stable-unified

— Reply to this email directly, view it on GitHub https://github.com/TheLastBen/fast-stable-diffusion/issues/518#issuecomment-1486293354, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOZN5GBGYUZE2LRL5PM2IDTW6KBEBANCNFSM6AAAAAASBWDP6U . You are receiving this because you were mentioned.Message ID: @.***>

Shadhil24 avatar Mar 28 '23 09:03 Shadhil24

yes, the instance name is determined by the images filenames

TheLastBen avatar Mar 28 '23 09:03 TheLastBen

Sorry for asking more questions, but how cam i give the same name to every images, should i give count like shadhil_1.png, shadhil_2.png, shadhil_3.png like that?

On Tue, Mar 28, 2023 at 3:22 PM Ben @.***> wrote:

yes, the instance name is determined by the images filenames

— Reply to this email directly, view it on GitHub https://github.com/TheLastBen/fast-stable-diffusion/issues/518#issuecomment-1486552010, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOZN5GDWMI4ENDGXAT7UWS3W6KYGFANCNFSM6AAAAAASBWDP6U . You are receiving this because you were mentioned.Message ID: @.***>

Shadhil24 avatar Mar 28 '23 10:03 Shadhil24

yes, that's a correct format, but don't use known words, use a random token like "bvhrghc"

TheLastBen avatar Mar 28 '23 11:03 TheLastBen

@TheLastBen Thank you, Its working fine now. I have changes my image names with the instance and class names

Shadhil24 avatar Mar 29 '23 07:03 Shadhil24

don't use a class name though, only the token

TheLastBen avatar Mar 29 '23 07:03 TheLastBen

How to use runpod with colab?

MohammadKatif avatar Jun 29 '24 10:06 MohammadKatif