Daniel Perry comments

Results 9 comments of


                                            Daniel Perry

Training Error: TypeError: true_fun and false_fun output must have identical types

It looks like the error is likely coming from the choice of checkpoint I passed into `train.py` using `model_name_or_path`. Starting fine-tuning using checkpoint `dalle-mini/dalle-mini/mega-1-fp16:latest`, I get the error mentioned, but...

Implementation of lookahead

Currently untested: * Multi GPU setup since I only have one * Theoretically supports loading from checkpoints that didn't use lookahead originally

Implementation of lookahead

Tested Multi-GPU using Azure and verified that it at least ran for >100 iterations and produced expected outputs. That's about as much as I can validate for now.

Implementation of lookahead

Also tested resuming training after starting without lookahead to confirm that works as well.

Implementation of lookahead

Friendly ping to @lucidrains 😄. My own testing with lookahead resulted in excellent improvements of outputs when training without attention, I'm interested to see if others see similar improvements. My...

Implementation of lookahead

It looks like something in PyTorch changed in the past year that makes the code not work. I promise it did work when I made the PR 😄. Unfortunately, I...

Unable to train to convergence (small dataset)

I'm noticing the same on my own dataset of ~175k text-image pairs, so maybe it's not a dataset size issue (or 200k is also not enough)? To add my own...

Unable to train to convergence (small dataset)

My attempt with the larger batch size is still going without any NaNs so far in about 62 hours of training on my 3090. Currently the loss is hovering around...

Unable to train to convergence (small dataset)

@jacobwjs Unfortunately, my machine power-cycled itself for some reason, so training on my x-clip model has stopped for now. I wanted to test out lucidrains's imagen model with my text-image...