While this is a large PR according to SLOC, most of the changes are mechanical. In order to collect traces, we need to do a bit of standardization:

Standardize to niter across all models. (Currently some use niterations)
Make train and eval take a step_fn argument so we can know where to draw appropriate boundaries. (e.g. for warmup.) Note that the model is free to define a "step" in whatever way makes sense for the context; we just care that the function is called. That said, most follow the pattern of forward, backward, optimizer, step.
Optionally models can annotate the forward, backward, and optimizer parts of the code. Unusual models like RL or GANs don't, but for most models this is straightforward and leads to a nicely annotated profile.

Interestingly, this project flushed out a bunch of bugs just by either running niter > 1 or inspection of the code during standardization.

In DLRM it wasn't possible to run with niter > 1, because the forward pass was outside of the loop.
We weren't zeroing grads for any of the huggingface models.
train and eval are identical for maml, so we were just duplicating work.
Some of the torchvision models had drifted from the gen'd version due to a merge conflict resolution 4 months ago.

Aside from that, the two main blocks of new code are the collect_diagnostics / _collect_diagnostics methods in torchbenchmark/__init__.py and the new top level diagnostics.py The high level flow is to do some runs to characterize the behavior, and then take a lightweight profile and a detailed profile. The former is for low distortion, while the latter makes it easy to follow up. (e.g. check the batch size.)

One of the interesting side effects of comparing normal step time to profiled step time is you can see which models are dispatch bound, or very close to dispatch bound. (All of these statistics are recorded in traces/summary.txt.) We should audit all of the traces as a team to sanity check which ones are providing useful signal and which ones need to be tweaked.

Oct 01 '21 04:10 robieta

FYI: A bootcamper recently helped add the PyTorch-UNet model in https://github.com/pytorch/benchmark/pull/487, and it seems to fit the "forward-backward-optimize" pattern, so maybe we add support to that model as well?

Oct 08 '21 18:10 xuzhao9

Hi @robieta!

Thank you for your pull request.

We require contributors to sign our Contributor License Agreement, and yours needs attention.

You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

Feb 16 '23 08:02 facebook-github-bot

benchmark
benchmark copied to clipboard

Add util to collect profiler traces.

Process

benchmark benchmark copied to clipboard

Add util to collect profiler traces.

Process

benchmark
benchmark copied to clipboard