Gabriel Mongaras
Gabriel Mongaras
Nice catch! Looks like I got the indexing wrong here, encoding the batch dimension as the "time" dimension. I suppose that means the position in the diffusion process becomes useless...
I added a few warnings to mention the issue when running the code: https://github.com/gmongaras/Diffusion_models_from_scratch/blob/f2e76317d70eb565a953f4959f5781a010177318/src/blocks/PositionalEncoding.py#L19 I also changed the code to correctly index the batch and time dimensions: https://github.com/gmongaras/Diffusion_models_from_scratch/blob/f2e76317d70eb565a953f4959f5781a010177318/src/blocks/PositionalEncoding.py#L34 However since...
> Apparently (tell me if I am wrong): > — "loadImagenet64.py" needs "Imagenet64_train_part1.zip" and "Imagenet64_train_part2.zip". > Imagenet64x64 does not have these files. It rather has: > train_data_batch_1, train_data_batch_2, train_data_batch_3... etc...
oh that makes sense. So the Kaggle dataset is probably in a different format from the ImageNet dataset which is why you're running into issues loading in the data. I...
I think `128116x12288` is batch size by image size. So you have a batch size of `128116` and an image size of `12288` which is a multiple of 3x64x64.
If I'm remembering correctly, when I was testing out the voice cloning model, I used this path to reference the model I wanted to load in. However, since the voice...
I had this exact same issue. The main problem stems from the internal tokenizer padding on the right, thus using the pad tokens to generate the output. You can fix...