Reproducing the results in the paper
Hi, I'm trying to reproduce the paper results, especially the 'cat_statue' concept below in Figure 1, but it seems like extra information is needed to do that.

Inversion and Generation Settings
I'll describe my settings first.
- Input samples: following the images in the figure, I put 3 images ('2.jpg', '3.jpg', and '6.jpg') into the 'cat_statue_236' folder and used the folder as a training dataset. No preprocessing was applied.

- Change in 'main.py': 'torch.backends.cudnn.deterministic = True' was added for reproducibility.
...
cfgdir = os.path.join(logdir, "configs")
seed_everything(opt.seed)
torch.backends.cudnn.deterministic = True
try:
...
- Inversion command: please check the command below. I set the initialization word as 'cat' and ran the command on 2 GPUs as written in the paper.
python main.py --base configs/latent-diffusion/txt2img-1p4B-finetune.yaml \
-t \
--actual_resume models/ldm/text2img-large/model.ckpt \
-n init_cat_gpu2_s0023 \
--gpus 0,1 \
--data_root datasets/cat_statue_236 \
--init_word cat \
lightning.trainer.benchmark=False # also for reproducibility
- Generation command: please check the command below. A seed argument was added for reproducibility, and all results were produced using 5,000 optimization steps following the implementation details in the paper. The attempted prompts were:
- A photo of *
- Banksy art of *
- Painting of a * riding a dragon
- Painting of two * fishing on a boat
- A * themed lunchbox
python scripts/txt2img.py --ddim_eta 0.0 \
--n_samples 8 \
--n_iter 2 \
--scale 10.0 \
--ddim_steps 50 \
--embedding_path logs/cat_statue_236<now>_init_cat_gpu2_s0023/checkpoints/embeddings_gs-4999.pt \
--ckpt_path models/ldm/text2img-large/model.ckpt \
--prompt <prompt> \
--seed 97 # just a random number
Generation Results
With the inversion and generation settings, I could obtain the results below. (up: my results, down: results from the supplementary)
- A photo of *

- Banksy art of *

- Painting of a * riding a dragon

- Painting of two * fishing on a boat

- A * themed lunchbox

Questions
As you can see, my results are quite different from the ones in the paper. My questions are as follows.
- Could you check if there is anything to change in the settings above?
- The seed for the inversion is 23 by default. Should I change the seed number? It would be of great help if you could provide the seed for the paper results.
- Other than the settings above and the inversion seed, are there any settings that I missed?
From a brief comparison, the config and code seem to match the version I used for training the cats.
The only difference I can think of is the seed, or the set (there's a link to our full sets in the repo's readme, you can find them here: https://drive.google.com/drive/folders/1fmJMs25nxS_rSNqS5hTcRdLem_YQXbq5). I'll retrain to make sure there's no issue. In the meanwhile just to make sure: are you using the same batch size? There's an auto-scaling of learning rates according to the size of the batch, this might also be impacting the results.
Oh, didn't know I should use all 7 images in the 'cat_statue' folder. I'll retrain as well. And the batch size was 4 by default, which matches the paper.
Another question. For the inversion of the concepts other than 'cat_statue', did you also use all of the images in the corresponding google drive folder, not just the ones shown in the figure?
Indeed. We note in some places in the paper that the figures show only exemplars from the full sets, but I'll make a note to better highlight it when we upload a revision. Regardless, you have the full sets in that drive.
Some of the sets have associated copyrights so I didn't upload them, but if there's any particular set which is not up and you want the images for, let me know and I'll send it to you.
Here we go:
Checkpoint: cat_26_09.zip
a painting of a * riding a dragon

a * themed lunchbox

banksy art of *

a photo of *

painting of two * fishing on a boat

I'm using the default seed (23). That said, I didn't clone a fresh repo and I have some local code changes (this path doesn't enter them, but they might be changing seeding results). Let me know if you still can't get it to work with the full set and I'll re-run on a fresh clone and also give you a seed so you can verify yourself.
Well, I retrained with the full set and these are the results I get.
- A photo of *

- Banksy art of *

- Painting of a * riding a dragon

- Painting of two * fishing on a boat

- A * themed lunchbox

Still, my results need some improvements. Could you try the inversion on the fresh clone and see if the paper results are reproducible?
Sure. Is the main.py change above the only adjustment you made? I'll do the same locally to make sure I'm getting your results with the default seed.
Thank you. The adjustments are the main.py change above, and I deleted the code below in SetupCallback.on_pretrain_routine_start() due to some file-saving errors while running on multiple GPUs. Also, I added a seed argument in scripts/txt2img.py.
else:
# ModelCheckpoint callback created log directory --- remove it
if not self.resume and os.path.exists(self.logdir):
dst, name = os.path.split(self.logdir)
dst = os.path.join(dst, "child_runs", name)
os.makedirs(os.path.split(dst)[0], exist_ok=True)
try:
os.rename(self.logdir, dst)
except FileNotFoundError:
pass
Just a heads up that I haven't forgotten you. I'm also seeing an issue with a fresh install of everything + your modifications + new environment from scratch.
I'm trying to track down the difference, but I'm doing a bunch of things in parallel so it might take a few days.
If you need this urgently and want me to send you the closest image of the code to when we ran this experiment, let me know. I ran that version on 2-3 seeds and it seems fine.
@rinongal
Thanks for sharing your checkpoint. Your example above made it crystal clear.
Just a note, using the latest from rinongal/textual_inversion.git on the main branch does not support the seed CLI arg.
txt2img.py: error: unrecognized arguments: --seed 97
Without that arg, it works great.
Is there a different branch for using seed via cli?
No different branch using seed via CLI but it's trivial to add. You just copy the seed code from the stable_txt2img.py (adding it to the parser + the import / call to seed_everything). If you need help with that let me know and I'll update the repo.
@rinongal It would be great if I could use your code. If you don't mind, could you send it to [email protected]?
Sent. If anyone else needs the same code version before a fix is out, please let me know.
@rinongal Thank you very much.
So the current code is unable to get the same editability? why is that ? Will the code get an update to fix this ?
@1blackbar Still trying to find where the issue stems from. At the moment this seems like an LDM-only issue, which likely came about when I added the partial SD support. I'm adding back my changes one-by-one to see where things break, but it's a lengthy process.
@rinongal I tried to find the cause on my end, and aside from the training templates, the training seems sensitive to the PIL version. Which PIL version did you use for the training?
@shadow2496 Training templates do not appear to be the source. It was the first thing I verified when I pulled the old code.
I've got one environment running pillow 9.1.1, and one running 9.0.1. I'm not seeing meaningfully different results in one compared to the other.
Hmm, I couldn't find other differences that affect the training than the conda environment and the training templates from the code you sent. The problem is I can't reproduce the results even from your code. Maybe the issue comes from using different GPUs? (I'm using TITAN RTX)
- A photo of *. (training templates included periods)

- Banksy art of *. (training templates included periods)

- Painting of a * riding a dragon

- Painting of two * fishing on a boat

- A * themed lunchbox

Just to make sure the difference in conda environments is not an issue, could you send the yaml file from the conda env export > environments.yaml command?
environments_freeze.txt Here you go. I changed the extension to .txt. since github won't let me upload a yaml. Just swap it back.
And here's the settings from the original run that produced the paper results. It's using 4 workers, but the checkpoint I sent here was made using 2 so that shouldn't be an issue.
cat2022-07-03T12-29-05-project.txt
I'm honestly still not sure what's causing the discrepancy here. I'll try setting things up in a colab so we can make sure we're getting synchronized results and work from there.
@shadow2496 do you succeed to reproduce the result.? I met the same problem with you ..
@rinongal one more question: in the enviroments_freeze.txt you provide above, transformers==4.3.1 but in https://github.com/rinongal/textual_inversion/blob/main/environment.yaml, transformers==4.18.0 which one is correct
@CrossLee1 The enviroments_freeze.txt above is the one I used locally to train the cat checkpoint in the beginning of this thread. The one in the repo is just updated to reflect Stable-Diffusion requirements. Unfortunately, we made quite a few changes to add support for SD and since training here is seed sensitive, these tend to impact the results as well.
I'm trying to find the time to look deeper into this or set up a colab where I can search for a seed that gives close results in a shared, consistent environment. Unfortunately I'm a bit bogged down with other things so until I get around to that, 'seed sensitivity' is the best answer I can give you.
@CrossLee1 The enviroments_freeze.txt above is the one I used locally to train the cat checkpoint in the beginning of this thread. The one in the repo is just updated to reflect Stable-Diffusion requirements. Unfortunately, we made quite a few changes to add support for SD and since training here is seed sensitive, these tend to impact the results as well.
I'm trying to find the time to look deeper into this or set up a colab where I can search for a seed that gives close results in a shared, consistent environment. Unfortunately I'm a bit bogged down with other things so until I get around to that, 'seed sensitivity' is the best answer I can give you.
@shadow2496 @CrossLee1 do you succeed to reproduce the result? I met the same problem with you .