ai-toolkit icon indicating copy to clipboard operation
ai-toolkit copied to clipboard

Flux.2 is downloading transformer and VAE again even thought it was downloaded .

Open mcDandy opened this issue 4 weeks ago • 8 comments

This is for bugs only

Did you already ask in the discord?

No

You verified that this is a bug and not a feature request or question by asking in the discord?

Yes

Describe the bug

After restarting computer and starting a job Flux.2 is being downloaded AGAIN even thought it is allready twice in .cache/models...Flux.1/snapshots. I allready trained with it and now it behaves as if it was the first time. The only thing I did is to change ofload % of the job I wanted to continue training. It is also downloading VAE (copied file from snapshots to blobs with correct name to cicumvent 6 hours of downloading)

mcDandy avatar Nov 27 '25 12:11 mcDandy

I have the exact same issue except with Z Image. Every time I start a new run it downloads the transformer again even though it is in the cache. So I have to wait 30 minutes every time I start a run for 20+gb of data to download again

mwelliott avatar Dec 02 '25 00:12 mwelliott

I think the tool likely tries to fetch the latest version of the models from scratch for each new upstream git commit. If the upstream repo on huggingface adds a new commit this will be treated as new even if the only thing that was updated was the readme.

Normally this wouldn't be a problem if the repo was simply cloned via git.

But it looks like the toolkit clones from scratch each time and stores the raw cloned data to separate snapshot directories, for example:

- C:\Users\<username>\.cache\huggingface\hub\models--Tongyi-MAI--Z-Image-Turbo\snapshots\466b7f17ccb9a2d3656152a12652893a15b02ced
- C:\Users\<username>\.cache\huggingface\hub\models--Tongyi-MAI--Z-Image-Turbo\snapshots\78771b7e11b922c868dd766476bda1f4fc6bfc96

These correspond to git commits.

AndrejMitrovic avatar Dec 02 '25 01:12 AndrejMitrovic

I think the tool likely tries to fetch the latest version of the models from scratch for each new upstream git commit. If the upstream repo on huggingface adds a new commit this will be treated as new even if the only thing that was updated was the readme.

Normally this wouldn't be a problem if the repo was simply cloned via git.

But it looks like the toolkit clones from scratch each time and stores the raw cloned data to separate snapshot directories, for example:

- C:\Users\<username>\.cache\huggingface\hub\models--Tongyi-MAI--Z-Image-Turbo\snapshots\466b7f17ccb9a2d3656152a12652893a15b02ced
- C:\Users\<username>\.cache\huggingface\hub\models--Tongyi-MAI--Z-Image-Turbo\snapshots\78771b7e11b922c868dd766476bda1f4fc6bfc96

These correspond to git commits.

Thank you Andre! That makes a ton of sense and has reduced my frustration now that I know what is going on. Really appreciate the info good sir

mwelliott avatar Dec 02 '25 02:12 mwelliott

Bump - As above keeps repeatedly downloading safetensors and vae etc. Over 90GB so far for my z image folder alone. Id rather have an outdated model that i manually update if needed than it constantly wasting bandwidth and hard drive space and time downloading.

Section59 avatar Dec 02 '25 05:12 Section59

(Windows) When this happens, i start a job then i cancel it during the model download. Then i go in /.cache folder and i move the previous downloaded model inside the folder of the day. Finaly i start the job again.

BiouP avatar Dec 02 '25 07:12 BiouP

There must be a better solution for that right? Like a better filenaming or md5 system for that. It takes ages for me to redownload all the model files.

edankwan avatar Dec 02 '25 09:12 edankwan

(Windows) When this happens, i start a job then i cancel it during the model download. Then i go in /.cache folder and i move the previous downloaded model inside the folder of the day. Finaly i start the job again.

Damn, I didn't think that would work so I just deleted and am downloading again lol.

Well, it's good to see I'm not the only one and it's not some odd error. I'll do it this way next time. Thanks!

Urabewe avatar Dec 02 '25 19:12 Urabewe

(Windows) When this happens, i start a job then i cancel it during the model download. Then i go in /.cache folder and i move the previous downloaded model inside the folder of the day. Finaly i start the job again.

I gave this a try - unfortunately when I stop a job while it is downloading, it doesn't stop until the download 100% completes. It just sits there spamming "Stopping job" in the terminal. If I force close the app and restart, it just continues to try to stop the job until it is completed. Ugh.

edit - just realized I can force mark a job as 'stopped' which did the trick. woo!

mwelliott avatar Dec 03 '25 01:12 mwelliott