sd-scripts icon indicating copy to clipboard operation
sd-scripts copied to clipboard

Training support for HiDream-I1?

Open nitinh12 opened this issue 8 months ago • 17 comments

Can this be added for 48gb vram?

nitinh12 avatar Apr 10 '25 10:04 nitinh12

@kohya-ss @rockerBOO Simplertuner has already started implementing this. Please can you add this, too? I am more used to SD scripts.

nitinh12 avatar Apr 12 '25 10:04 nitinh12

+1 All other trainers have it as well

EClipXAi avatar Apr 18 '25 00:04 EClipXAi

@rockerBOO @kohya-ss AI toolkit also implemented this.

nitinh12 avatar Apr 18 '25 09:04 nitinh12

Thank you for your suggestion. HiDream-I1 is a very interesting model. It's good that other trainers have already implemented it, so we can refer to them.

However, we would also like to support FramePack in Musubi Tuner. We will consider the priority.

kohya-ss avatar Apr 18 '25 13:04 kohya-ss

Thank you for your suggestion. HiDream-I1 is a very interesting model. It's good that other trainers have already implemented it, so we can refer to them.

However, we would also like to support FramePack in Musubi Tuner. We will consider the priority.

I would prioritize Hi Dream.

nitinh12 avatar Apr 18 '25 13:04 nitinh12

Thank you for your suggestion. HiDream-I1 is a very interesting model. It's good that other trainers have already implemented it, so we can refer to them.

However, we would also like to support FramePack in Musubi Tuner. We will consider the priority.

@kohya-ss got 6 thumbs up, will you prioritize this now?

nitinh12 avatar Apr 22 '25 08:04 nitinh12

I was excited for Framepack, but it turned out to be a lot of hype tbh. It's promising for the future once they train a WAN version, but since it's a HY model, the i2v consistency is as terrible as we're used to. It takes much longer to generate even a 5S video. Apparently the consistency claims were overblown too. It can only generate 52 frames (2 seconds) of consistency because it doesn't see past that. So if you just want someone to dance on the spot for longer than 10s, it can manage it much better than cutting and stitching end frames together, but it's not going to generate 1 minute short films or anything with consistent characters across scenes. The thing it's good at is avoiding accumulation errors over time. It's also already been superseded by MAGI-1, which at least theoretically does what Framepack promised to do (real world tests on their site didn't bare any resemblance to the demos, but maybe they only offer the 4.5b version for the free tier). But MAGI-1 is way outta reach for training anyway. It's open source, but even inference requires 8xH100s

Tophness avatar Apr 23 '25 03:04 Tophness

also i think for hidream, t5 training is not worth it, probably even clip as well, seems that llama does the heavy lifting.

so it'll be nice we we can train the text encoders for hidream, maybe just the clip and llama one.

https://github.com/tdrussell/diffusion-pipe

also seems diffusion pipe has training working with 24gb vram

looking forward to when you'll work on hidream :)

EClipXAi avatar Apr 27 '25 01:04 EClipXAi

I've almost finished the work related to FramePack, so I'd like to start working on sd-scripts issues and PRs, as well as HiDream-I1.

I can't promise when that will be, though. Thank you for your understanding.

kohya-ss avatar Apr 27 '25 02:04 kohya-ss

I've almost finished the work related to FramePack, so I'd like to start working on sd-scripts issues and PRs, as well as HiDream-I1.

I can't promise when that will be, though. Thank you for your understanding.

Any update? Have you started working on this? I am looking forward to this.

nitinh12 avatar May 06 '25 12:05 nitinh12

I've almost finished the work related to FramePack, so I'd like to start working on sd-scripts issues and PRs, as well as HiDream-I1.

I can't promise when that will be, though. Thank you for your understanding.

I'm very curious about how your hidream training script is going, and I can't wait to try it out.

fengchunlvdragonplus avatar May 09 '25 01:05 fengchunlvdragonplus

any news?? I'm very excited to test it on kohya_ss!!!😎

dsienra avatar May 17 '25 04:05 dsienra

@kohya-ss Please add this. I see you're very active with musubi tunner, but please don't forget us

nitinh12 avatar May 20 '25 11:05 nitinh12

I'm sorry for the delay. I'll try to find some time to work on Lumina and HiDream.

kohya-ss avatar May 20 '25 11:05 kohya-ss

I'm sorry for the delay. I'll try to find some time to work on Lumina and HiDream.

can you do chroma too? please

nanaj96 avatar May 21 '25 18:05 nanaj96

I'm sorry for the delay. I'll try to find some time to work on Lumina and HiDream.

@kohya-ss simpletuner caches the text encoder and VAE outputs first and then unloads them from the GPU, which saves a lot of VRAM, and we can train without quantizing. Would love to see a similar implementation in Kohya

nitinh12 avatar May 30 '25 13:05 nitinh12

I'm sorry for the delay. I'll try to find some time to work on Lumina and HiDream.

I'm gently and politely wondering if we can expect hi-dream addition to the kohya family?

foggyghost0 avatar Jun 28 '25 13:06 foggyghost0