benchmark
benchmark copied to clipboard
Add util to collect profiler traces.
While this is a large PR according to SLOC, most of the changes are mechanical. In order to collect traces, we need to do a bit of standardization:
- Standardize to
niter
across all models. (Currently some useniterations
) - Make
train
andeval
take astep_fn
argument so we can know where to draw appropriate boundaries. (e.g. for warmup.) Note that the model is free to define a "step" in whatever way makes sense for the context; we just care that the function is called. That said, most follow the pattern offorward, backward, optimizer, step
. - Optionally models can annotate the forward, backward, and optimizer parts of the code. Unusual models like RL or GANs don't, but for most models this is straightforward and leads to a nicely annotated profile.
Interestingly, this project flushed out a bunch of bugs just by either running niter
> 1 or inspection of the code during standardization.
- In DLRM it wasn't possible to run with
niter
> 1, because the forward pass was outside of the loop. - We weren't zeroing grads for any of the huggingface models.
-
train
andeval
are identical formaml
, so we were just duplicating work. - Some of the torchvision models had drifted from the gen'd version due to a merge conflict resolution 4 months ago.
Aside from that, the two main blocks of new code are the collect_diagnostics
/ _collect_diagnostics
methods in torchbenchmark/__init__.py
and the new top level diagnostics.py
The high level flow is to do some runs to characterize the behavior, and then take a lightweight profile and a detailed profile. The former is for low distortion, while the latter makes it easy to follow up. (e.g. check the batch size.)
One of the interesting side effects of comparing normal step time to profiled step time is you can see which models are dispatch bound, or very close to dispatch bound. (All of these statistics are recorded in traces/summary.txt
.) We should audit all of the traces as a team to sanity check which ones are providing useful signal and which ones need to be tweaked.
FYI: A bootcamper recently helped add the PyTorch-UNet
model in https://github.com/pytorch/benchmark/pull/487, and it seems to fit the "forward-backward-optimize" pattern, so maybe we add support to that model as well?
Hi @robieta!
Thank you for your pull request.
We require contributors to sign our Contributor License Agreement, and yours needs attention.
You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted.
Process
In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.
Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed
. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.
If you have received this in error or have any questions, please contact us at [email protected]. Thanks!