Bagheera

Results 447 comments of Bagheera

update and rebased this PR ontop of the Kolors support @sayakpaul

@sayakpaul still interested in hunyuan-dit?

@sayakpaul can't get the actual model prediction to work. but i've done the rest of the needful to make validations happen. they don't look normal on MacOS. it's all just...

@sayakpaul i will assume there is no longer interest in this but feel free to pick it up again at some point if you like

would you like to give it a try? if you start on it, and open a pull request, we can all work on it together and finish it.

we could make a synthetic dataset using an older pix2pix model? or a controlnet? or even some proprietary option. which would be best for making instruct edit data?

channel-wise concat was shown not to work as well as sequence-level concat; see HiDream E1 and Flux Kontext's technical report. of course, the attention scale changes, and the model slows...

~~well, loss is at 3.6 without the fix, and normal range with it 🤔~~ the loss is due to my use of the VAE, not the text embed sequence length....

the problem is that because the text positions and img positions share the same RoPE (tricky) i can't give the text embeds purely the max sequence length, because then there...

if we run the pipeline with a 1024x1024 image **and** a very long prompt, it ~~simply crashes with the same kind of position errors.~~ silently truncates.