data icon indicating copy to clipboard operation
data copied to clipboard

data/benchmarks

Open msaroufim opened this issue 2 years ago • 10 comments

Please read through our contribution guide prior to creating your pull request.

  • Note that there is a section on requirements related to adding a new DataPipe.

Fixes #416

Changes

  • Added CLI to run various benchmarks on various datasets and measure key metrics

msaroufim avatar May 19 '22 01:05 msaroufim

May 24 notes

Notes from discussion with Vitaly

  • The lib-kineto profiler won't split iterator performance by datapipe and will show up as one big block
  • Load image, transform, batch and collate (collate will look like the biggest one) - measurements will be cumulative
  • Rotation, collation etc.. all need to happen before passing into data loader (all of these need to be datapipe maps) https://pytorch.org/data/main/torchdata.datapipes.map.html
  • Batch counting needs to happen before we receive a batch from data loader https://github.com/pytorch/data/blob/411167bbca3b800b3f54d37674e8751e40a80e29/benchmarks/run_benchmark.py#L110
  • Check if autograd profiler works https://github.com/pytorch/pytorch/blob/master/torch/utils/data/datapipes/_typing.py#L462 - does this work in multiprocessing? fork will make everything disappear

May 31 notes

  • Pytorch profiler needs to report less things - opened an issue on kineto repo https://github.com/pytorch/kineto/issues/609
  • Modularize code a bit better so we can create baseline for datapipe/dataset vs dataloaderv1/v2 - make train.py take a generic iterator
  • Check if large call stack is because of shuffling done by vision
  • Do scaling after this is done

msaroufim avatar May 24 '22 18:05 msaroufim

Ok we can now get a trace and have an end to end example working, data loading is not the bottleneck here yet so will keep experimenting

image

msaroufim avatar May 25 '22 23:05 msaroufim

Screen Shot 2022-05-25 at 5 28 51 PM

msaroufim avatar May 26 '22 00:05 msaroufim

Next thing I'd like to try out is including these datasets from torchtext.datasets import amazonfullreview after which we can start warming up GPUs

msaroufim avatar May 26 '22 15:05 msaroufim

The call graph for data pipe construction is long but it's not the data loading that's the bottleneck here since utilization is just 7%. Need to fix the collation problems and bump up the batch size

Also I finally figured out how to build torchtext from source so can use those datapipes as well https://github.com/pytorch/text/issues/1743

profile

Screen Shot 2022-05-26 at 5 14 02 PM

msaroufim avatar May 27 '22 00:05 msaroufim

Discussion with Vitaly June 21

  • Will focus on running on mc4 with dataloader v1 on various hardware configurations (SSD, HDD) and use a few starter cloudformation templates to make this easier

msaroufim avatar Jun 21 '22 22:06 msaroufim

Overall, LGTM with a few comments. Let me know what you additional features you plan to add.

nit: Need copyright headers for .py files

Thanks @NivekT will address all your feedback - As far as new features to add for this PR not much I think there's a bunch of cleanup I need to do

  • Clean up report into its own dataclass which you can then export to whatever format you want: html, md, csv etc..
  • Address all your feedback
  • Some more cleanup

And I think the next PR should be focused around integrating the aws cli into CI where we can benchmark a distributed systems setup per @NicolasHug's request

And after that we can see which of the partner integrations should be added to this setup as weell

msaroufim avatar Jul 19 '22 22:07 msaroufim

@msaroufim @VitalyFedyunin @NivekT following up on my earlier comments in https://github.com/pytorch/data/issues/416#issuecomment-1164404834 I also have a separate PR (https://github.com/pytorch/vision/pull/6196) that already provides support for the cross-product of:

  • Distributed Learning (DDP) vs 1-GPU training
  • Datapipes (with DataLoader or torchdata.dataloader2) vs Iterable datasets (non-DP) vs MapStyle Datasets
  • Full training procedure or Data-loading only (with or without transforms) or Model training only (generating fake datasets)
  • Timing of data-loading vs model training
  • any classification model from torchvision

(It also has FFCV support, but that's less relevant for us here).

Since it's directly adapted from torchvision recipes, it's also a bit closer to the kind of training that users would be doing in the wild.

Do you think it would make sense to join our benchmarking efforts here? I'm happy to provide support if you'd like to collaborate.

CC @nairbv

NicolasHug avatar Jul 21 '22 15:07 NicolasHug

@NicolasHug I am in the processing of going through both setups, running them on our AWS cluster, and identifying the differences. I agree that combining the efforts is the right approach. Let me dig a bit deeper first and I can schedule a meeting for all of us to chat.

NivekT avatar Jul 21 '22 15:07 NivekT

@NicolasHug I think the right way to divide this up would be

  • I work on the infra setup, the benchmark artifact and the benchmark export
  • I leverage your model training scripts since you're the domain expert

I would like to also eventually do something like pull any of the HF datasets and just benchmark there but I don't believe the datasets there give me sufficient information to create a toy model with the right shapes automatically

But yeah would love to talk

msaroufim avatar Jul 21 '22 17:07 msaroufim