Mihir Patel
Mihir Patel
Closed as done
@ananyahjha93 note that we just updated Composer to support the most recent deepspeed release if this is still an issue
Closing for now since we've updated deepspeed
Closing for now as we're tracking elsewhere, but it's low priority. We're open to community PRs!
> Can we make sure that CI (github) tests are run on torch 2 images as part of this change? +1. Take a look at what we do in Composer
Not sure if related, but I'm seeing significant differences with the groupnorm implementation in AITemplate on the order of `1e-4` compared to PyTorch. Not quite sure why since the accumulation...
Great to hear! Please let me know if you encounter any further issues.
> @mvpatel2000 out of curiosity, may I ask you which GPUs did you use to run this test? Its much faster than what I got. Are these H100? I was...
> We are also using 2xA100 with the same code snippet than shared above but our throughput is quite a bit lower, around ~4.1 ba/sec when you reach >6.6ba/sec. ```...
Thanks for flagging this! > My best guess at a clean solution is to determine the microbatch size with the uncompiled model so that only one compilation needs to be...