torchtitan icon indicating copy to clipboard operation
torchtitan copied to clipboard

Add validation and batched inference to flux

Open CarlosGomes98 opened this issue 7 months ago • 3 comments

  • Add val loss
  • Add batched inference

Ideally we would also add COCO2014 as dataset. However, I havent been able to find a hf dataset containing both the images and the captions. So, for now, Ive added a dataset which is just the first 30k samples of the training dataset, for functional verification

~~This also includes changes from https://github.com/pytorch/torchtitan/pull/1138~~

CarlosGomes98 avatar May 19 '25 16:05 CarlosGomes98

Hi @CarlosGomes98!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

facebook-github-bot avatar May 19 '25 16:05 facebook-github-bot

Thanks for working on this!

It seems a lot of good stuff is being added. While I can clearly sense the values of most changes, to be honest it's a bit difficult for reviewers to keep track of all the changes and their motivations.

Do you think it's doable to split the changes into several PRs, each with its own theme and documentation as PR summary / doc string / comments?

Yes it did grow a bit out of hand. I can definitely split it at least into inference and validation. Will see if I can make it more granular than that

CarlosGomes98 avatar May 21 '25 06:05 CarlosGomes98

@CarlosGomes98 one quick note is flux-train is a little bit behind the main branch, let's just solve the comments and create a PR to main branch instead.

wwwjn avatar May 21 '25 21:05 wwwjn

Closing this PR and splitting into smaller ones

CarlosGomes98 avatar May 27 '25 14:05 CarlosGomes98