Mihir Patel comments

Results 172 comments of


                                            Mihir Patel

Inconsistency in Return Value of Functional Algorithms

Closed as done

Support for Deepspeed stage-3

@ananyahjha93 note that we just updated Composer to support the most recent deepspeed release if this is still an issue

Support for Deepspeed stage-3

Closing for now since we've updated deepspeed

Modular Attention

Closing for now as we're tracking elsewhere, but it's low priority. We're open to community PRs!

Torch2 updt

> Can we make sure that CI (github) tests are run on torch 2 images as part of this change? +1. Take a look at what we do in Composer

Different results between AITemplate example and DreamStudio for SD 2.0

Not sure if related, but I'm seeing significant differences with the groupnorm implementation in AITemplate on the order of `1e-4` compared to PyTorch. Not quite sure why since the accumulation...

Dose composer support best checkpoint saver which can monitoring the checkpoint for best metrics or losses?

Great to hear! Please let me know if you encounter any further issues.

Dose composer support best checkpoint saver which can monitoring the checkpoint for best metrics or losses?

> @mvpatel2000 out of curiosity, may I ask you which GPUs did you use to run this test? Its much faster than what I got. Are these H100? I was...

Dose composer support best checkpoint saver which can monitoring the checkpoint for best metrics or losses?

> We are also using 2xA100 with the same code snippet than shared above but our throughput is quite a bit lower, around ~4.1 ba/sec when you reach >6.6ba/sec. ```...

Slow startup and OOMs when using device_train_microbatch_size with torch.compile

Thanks for flagging this! > My best guess at a clean solution is to determine the microbatch size with the uncompiled model so that only one compilation needs to be...