OSError: cannot open resource???
i trains the first 332 then when its done it trains again to 189 and the i get a "OSError: cannot open resource?"
Traceback (most recent call last):
File "main.py", line 830, in
Download DejaVuSans.ttf file from internet and put it in Dreambooth-Stable-Diffusion/data/DejaVuSans.ttf location
Download
DejaVuSans.ttffile from internet and put it inDreambooth-Stable-Diffusion/data/DejaVuSans.ttflocation
can you explain to me why? plus my directory isn't like that
You’ll have to make the folder if it doesn’t exist.
the reason you’re doing this is because the program needs the font when generating text on some of the sample pictures, and the missing file causes an error.
You’ll have to make the folder if it doesn’t exist.
the reason you’re doing this is because the program needs the font when generating text on some of the sample pictures, and the missing file causes an error.
alright, thanks... i did the training. i tried to run it on but it wouldn't load it on Hlky repo, id ends up with ^C. have you tried to use the new checkpoint on any other repo or locally to generate images, or did you generate them in dream booth also?
I moved my model into hlky. Should work fine. “^C” is the keyboard command ctrl+c, which is interrupt command for Linux/python. You’re probably pasting something weird in the command line?
I moved my model into hlky. Should work fine. “^C” is the keyboard command ctrl+c, which is interrupt command for Linux/python. You’re probably pasting something weird in the command line?
I didn't really change anything, I just gave it the location of the new model. but it was the model that crashed cuz of the OS error. So I've decided to retain now. ur samples are really good? could I know how many regularization images did you use? how many images of yourself did you use?
I used about 12 photos for regularization (I used “man” as my prompt and generated 12 512x512 images), and I used about the same amount of photos of me, in various lighting conditions, angles, and expressions. I also used “man” as the class that I trained to. Currently doing a test with 100 regularization images.
I trained 4000 iterations (finetune unfrozen .yaml file is what you need to edit to train more than 800 iterations). Look at the end of the file for global iteration cutoff threshold.
I used about 12 photos for regularization (I used “man” as my prompt and generated 12 512x512 images), and I used about the same amount of photos of me, in various lighting conditions, angles, and expressions. I also used “man” as the class that I trained to. Currently doing a test with 100 regularization images.
I trained 4000 iterations (finetune unfrozen .yaml file is what you need to edit to train more than 800 iterations). Look at the end of the file for global iteration cutoff threshold.
Well it stopped training again
Average Peak memory 29986.93MiB Epoch 2: 56%|▌| 180/322 [03:02<02:24, 1.01s/it, loss=0.358, v_num=0, train/los
Traceback (most recent call last):
File "main.py", line 835, in {loader_name}() method defined to run Trainer.{trainer_method}.")
pytorch_lightning.utilities.exceptions.MisconfigurationException: No test_dataloader() method defined to run Trainer.test.
IDK if it is finished... or if it's an error. Also for your really good results, you trained for 4000 iterations? how long did that take? im currently using A100
@Maki9009
If you want to train for a longer period of time, you can just replace trainer.test(model, data). Either that or you can use a boolean with argparse, or just pass the check.
The error happens at line https://github.com/XavierXiao/Dreambooth-Stable-Diffusion/blob/bb8f4f2dc1d8d1b9ce4f705d03621e6ac8e50028/main.py#L835
Just replace it with:
print("I don't want to test :-(")
Or
pass
@ExponentialML
If you want to train for a longer period of time, you can just replace
trainer.test(model, data). Either that or you can use a boolean with argparse, or just pass the check.The error happens at line
https://github.com/XavierXiao/Dreambooth-Stable-Diffusion/blob/bb8f4f2dc1d8d1b9ce4f705d03621e6ac8e50028/main.py#L835
Just replace it with:
print("I don't want to test :-(")Or
pass
so technically it finished training? cuz it was still set at 800iter... now I'm doing 2000 currently. would I still get the same error? cuz I haven't changed this yet?
@Maki9009 Yes it still technically finished training. Removing this test option allows it to train for longer (not the option itself, it's a bug that's missing some params in this repo, but you can possibly overfit what you're trying to train.
@Maki9009 Yes it still technically finished training. Removing this test option allows it to train for longer (not the option itself, it's a bug that's missing some params in this repo, but you can possibly overfit what you're trying to train.
i finished training... but i have two .ckpt files one is called "last.ckpt" and the other is called "epoch=000001.ckpt" idk which ones the im supposed to use. And if i was to apply it to hlky all i need to do is point the repo to the model right nothing else?
@Maki9009 Yes it still technically finished training. Removing this test option allows it to train for longer (not the option itself, it's a bug that's missing some params in this repo, but you can possibly overfit what you're trying to train.
i finished training... but i have two .ckpt files one is called "last.ckpt" and the other is called "epoch=000001.ckpt" idk which ones the im supposed to use. And if i was to apply it to hlky all i need to do is point the repo to the model right nothing else?
The epoch file is a save point at epoch number 000001, and the last.ckpt is the latest model that saved when training was finished. A use case is if you feel that last.ckpt has too much training, you can fall back to one of the epoch checkpoints. Either one is fine, but last.ckpt will probably have less editability but more identity preservation. It's your call on which one is best for you.
@Maki9009 Yes it still technically finished training. Removing this test option allows it to train for longer (not the option itself, it's a bug that's missing some params in this repo, but you can possibly overfit what you're trying to train.
i finished training... but i have two .ckpt files one is called "last.ckpt" and the other is called "epoch=000001.ckpt" idk which ones the im supposed to use. And if i was to apply it to hlky all i need to do is point the repo to the model right nothing else?
Use latest.ckpt for the fully trained model. And yes, you can just point hlky at the model, or name it “model.ckpt” and replace the current model.ckpt you’re using
Oh, and show us your results!
Oh, and show us your results!
the training samples or meh, but I kinda expected that since. my reg images or from clip front..which I've been told is not the best method. I'm trying to get it run on free colab currently.. but it requires more Ram. it can't load up the full model. so ill try locally in a bit and hope the results look nice enough. or hope it even runs.
also, Emad said in the discord chat, that they'll be releasing guides this week so HOPEFULLY. IT HAS SOME TROUBLESHOOTING FOR THIS. because I'm basically setting Google servers and runpods servers on fire.