Phil Wang comments

Results 814 comments of


Phil Wang

Unconditional training is much slower than the same network from denoising-diffusion-pytorch

that's interesting, I'm not sure there is a subtle difference in the resnet blocks. I'm using the GLIDE style architecture here with norm, activation, then project However, the original ddpm...

Any help is needed?

@babysor yea, just jump in with a pull request @Diaz1980 no there is not

Question: masking in token shifting

@pfeatherstone i think i allow for bidirectional shifting, maybe that's why i can check later

save_for_backward can only save variables, but argument 5 is of type bool

@aliabid2243 Hi Abid! Which version of pytorch are you on?

Proposal for Model Reconstruction Based on the Latest Memformer Research Paper

what is new in the most recent paper that is not in the repo?

I wrote a flax version of DDPM based on your implementation!

@yiyixuxu :rocket: looks great YiYi! I added a link to it from the readme

Question about Over-Smoothing problem?

@Hosein47 do you have experiments setup to measure oversmoothing? part of me wonders if it is even a problem worth solving, given chatgpt has shown scale and data matters way...

Question about Over-Smoothing problem?

try https://github.com/lucidrains/x-transformers#gated-residual for starters, and if you see it alleviate oversmoothing, i can add a simpler technique

Is this still in the works or has it been stopped?

@ghpkishore architecture-wise it is pretty much complete there are no plans to train it, as it seems like everyone is preferring the denoising diffusion approach

Is this still in the works or has it been stopped?

well, will you look at that https://arxiv.org/abs/2310.05737 guess i'll put more work into this