Jiani Wang
Jiani Wang
> I think I may have found the reason for this. For the last step in each training job, the loss seems to be incorrect, at least in the plot....
> We would expect the lr to increase in each step by 0.0008/5 = 0.00016 I tried to reproduce with `main` branch. Here's my setting: ``` [optimizer] name = "AdamW"...
> @wwwjn Have you encountered this issue when running on the devgpu? Landing the PR looks not harmful but wants to understand why this is required specific to Flux encoder....
Hi @cli99 , I want to follow up with this PR, thanks for contributing! I haven't met this issue during my training, do you know under what circumstances the tensor...
Close this PR for now because I can not reproduce it. Ignore it for now
Nice catch! LGTM, please sign the CLA to processed
@CarlosGomes98 one quick note is `flux-train` is a little bit behind the main branch, let's just solve the comments and create a PR to main branch instead.
> > By tuning off classifier-free guidance(in dataloader), eval steps and load from downloaded dataset, the hash of each batch is identical across different runs. The issue is around deterministic...
cc @CarlosGomes98 @tianyu-l , here's a centralized tracker of Flux issue and next steps.
Preprocessing code is here: https://github.com/pytorch/torchtitan/tree/flux-train. The preprocessed data will take huge storge, because the generated t5 encoding for each sample is 256 * 4096.