data
data copied to clipboard
Functional API document auto generation
🚀 The feature
Currently, we rely on the inline doc for each DataPipe to indicate the functional API. Considering torchdata
should have already been built when doc need to be generated, we should be able to rely on the following dictionaries to figure out the functional API with corresponding DataPipe.
- IterDataPipe: https://github.com/pytorch/pytorch/blob/b30c027abf6560dafb88d67ebd446851a5729651/torch/utils/data/datapipes/datapipe.py#L68
- MapDataPipe: https://github.com/pytorch/pytorch/blob/ea8a0184b76088c477c7cef4fe36d683d57fd880/torch/utils/data/datapipes/datapipe.py#L194
Motivation, pitch
Make the doc-gen more intelligent to reduce the amount of work when contributors implement DataPipe and the extra care for reviewers/
Alternatives
No response
Additional context
WDYT @NivekT
I think this is certainly possible. There are a few paths we can go down:
- Stick with the existing layout and inject the functional API names into the docstrings via a pre-processing function in
conf.py
during the generation process rather than having contributors explicitly add it to docstring - As we previously discussed, create a separate table/page with functional APIs
I think the priority is relatively low since I don't anticipate adding too many more DataPipes (unless the current doc layout is unhelpful for users and need to revamp)? Nonetheless, this is still worth thinking about and tracking.
Agree on the priority. We can spend a few days on this for BE next half~
I want to add another potential improvement for pyi gen.
Currently, the type hint for return value of each functional API is either IterDataPipe
or MapDataPipe
. We could change the type to each specific DataPipe class. For example:
class IterDataPipe:
def shuffler(self, *, buffer_size, unbatch_level) -> ShufflerIterDataPipe # Change it from IterDataPipe
I want to add another potential improvement for pyi gen. Currently, the type hint for return value of each functional API is either
IterDataPipe
orMapDataPipe
. We could change the type to each specific DataPipe class. For example:
I think this is a good idea, though import statements at the top might get a bit long.