data icon indicating copy to clipboard operation
data copied to clipboard

Functional API document auto generation

Open ejguan opened this issue 2 years ago • 4 comments

🚀 The feature

Currently, we rely on the inline doc for each DataPipe to indicate the functional API. Considering torchdata should have already been built when doc need to be generated, we should be able to rely on the following dictionaries to figure out the functional API with corresponding DataPipe.

  • IterDataPipe: https://github.com/pytorch/pytorch/blob/b30c027abf6560dafb88d67ebd446851a5729651/torch/utils/data/datapipes/datapipe.py#L68
  • MapDataPipe: https://github.com/pytorch/pytorch/blob/ea8a0184b76088c477c7cef4fe36d683d57fd880/torch/utils/data/datapipes/datapipe.py#L194

Motivation, pitch

Make the doc-gen more intelligent to reduce the amount of work when contributors implement DataPipe and the extra care for reviewers/

Alternatives

No response

Additional context

WDYT @NivekT

ejguan avatar May 05 '22 13:05 ejguan

I think this is certainly possible. There are a few paths we can go down:

  1. Stick with the existing layout and inject the functional API names into the docstrings via a pre-processing function in conf.py during the generation process rather than having contributors explicitly add it to docstring
  2. As we previously discussed, create a separate table/page with functional APIs

I think the priority is relatively low since I don't anticipate adding too many more DataPipes (unless the current doc layout is unhelpful for users and need to revamp)? Nonetheless, this is still worth thinking about and tracking.

NivekT avatar May 05 '22 14:05 NivekT

Agree on the priority. We can spend a few days on this for BE next half~

ejguan avatar May 05 '22 14:05 ejguan

I want to add another potential improvement for pyi gen. Currently, the type hint for return value of each functional API is either IterDataPipe or MapDataPipe. We could change the type to each specific DataPipe class. For example:

class IterDataPipe:
    def shuffler(self, *, buffer_size, unbatch_level) -> ShufflerIterDataPipe  # Change it from IterDataPipe

ejguan avatar Jun 30 '22 15:06 ejguan

I want to add another potential improvement for pyi gen. Currently, the type hint for return value of each functional API is either IterDataPipe or MapDataPipe. We could change the type to each specific DataPipe class. For example:

I think this is a good idea, though import statements at the top might get a bit long.

NivekT avatar Jun 30 '22 17:06 NivekT