Bagheera comments

Results 447 comments of


                                            Bagheera

Hunyuan-DiT feature branch plan

update and rebased this PR ontop of the Kolors support @sayakpaul

Hunyuan-DiT feature branch plan

@sayakpaul still interested in hunyuan-dit?

Hunyuan-DiT feature branch plan

@sayakpaul can't get the actual model prediction to work. but i've done the rest of the needful to make validations happen. they don't look normal on MacOS. it's all just...

Hunyuan-DiT feature branch plan

@sayakpaul i will assume there is no longer interest in this but feel free to pick it up again at some point if you like

InstructPix2Pix training script for SD3

would you like to give it a try? if you start on it, and open a pull request, we can all work on it together and finish it.

InstructPix2Pix training script for SD3

we could make a synthetic dataset using an older pix2pix model? or a controlnet? or even some proprietary option. which would be best for making instruct edit data?

InstructPix2Pix training script for SD3

channel-wise concat was shown not to work as well as sequence-level concat; see HiDream E1 and Flux Kontext's technical report. of course, the attention scale changes, and the model slows...

Qwen Image prompt encoding is not padding to max seq len

~~well, loss is at 3.6 without the fix, and normal range with it 🤔~~ the loss is due to my use of the VAE, not the text embed sequence length....

Qwen Image prompt encoding is not padding to max seq len

the problem is that because the text positions and img positions share the same RoPE (tricky) i can't give the text embeds purely the max sequence length, because then there...

Qwen Image prompt encoding is not padding to max seq len

if we run the pipeline with a 1024x1024 image **and** a very long prompt, it ~~simply crashes with the same kind of position errors.~~ silently truncates.