Jiani Wang

Results 37 comments of Jiani Wang

> I think I may have found the reason for this. For the last step in each training job, the loss seems to be incorrect, at least in the plot....

> We would expect the lr to increase in each step by 0.0008/5 = 0.00016 I tried to reproduce with `main` branch. Here's my setting: ``` [optimizer] name = "AdamW"...

> @wwwjn Have you encountered this issue when running on the devgpu? Landing the PR looks not harmful but wants to understand why this is required specific to Flux encoder....

Hi @cli99 , I want to follow up with this PR, thanks for contributing! I haven't met this issue during my training, do you know under what circumstances the tensor...

Close this PR for now because I can not reproduce it. Ignore it for now

Nice catch! LGTM, please sign the CLA to processed

@CarlosGomes98 one quick note is `flux-train` is a little bit behind the main branch, let's just solve the comments and create a PR to main branch instead.

> > By tuning off classifier-free guidance(in dataloader), eval steps and load from downloaded dataset, the hash of each batch is identical across different runs. The issue is around deterministic...

cc @CarlosGomes98 @tianyu-l , here's a centralized tracker of Flux issue and next steps.

Preprocessing code is here: https://github.com/pytorch/torchtitan/tree/flux-train. The preprocessed data will take huge storge, because the generated t5 encoding for each sample is 256 * 4096.