ray
ray copied to clipboard
[Datasets] Refactor Ray Data API documentation
Signed-off-by: Cheng Su [email protected]
Why are these changes needed?
Please review the new documentation in https://ray--31204.org.readthedocs.build/en/31204/data/api/api.html .
This PR is to refactor Ray Data API documentation as specified in https://github.com/ray-project/ray/issues/30692 . The motivation is to have a better layout and view of all API references (same as other systems - Pandas, NumPy and Spark).
The new structure has 3 pages:
1.index page (same as before, link) 2.summary page for each group of APIs (newly added, link) 3.doc page for individual APIs (newly added, link)
In addition, the side bar also has list of APIs per category:
Related issue number
Closes https://github.com/ray-project/ray/issues/30692
Checks
- [x] I've signed off every commit(by using the -s flag, i.e.,
git commit -s) in this PR. - [x] I've run
scripts/format.shto lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
- [ ] Unit tests
- [ ] Release tests
- [ ] This PR is not tested :(
I think we will also need to take care of the consistency with other Ray libraries. This will make data API docs different than the rest of Ray, so we may need to reach a consensus with the ray-docs group that this is the structure we want for Ray overall.
Yes this is fair. WDYT? @maxpumperla, @ericl and @zhe-thoughts, thanks.
https://ray--31204.org.readthedocs.build/en/31204/data/api/doc/ray.data.Dataset.map_batches.html --- if you look at this, it's very hard to read. There are too many args, the type signature is way too verbose, and there are 5 different Tip blocks.
Yes, and it's the same as current page. This PR is just moving the doc of map_batches to a separate page. IMO the current page is even harder to read. We should definitely improve the doc of individual API for sure separately.
After this PR, every method now has a sidebar entry. This seems a bit much: could we hide these entries from the sidebar, or not use headings?
I don't have a strong opinion on this. WDYT @clarkzinzow? Remember you think the sidebar is very useful.
@ericl @c21 The sidebar entries are very useful for fast API doc scanning, I always use them for any libraries that have classes with many methods or modules with many functions, and is a somewhat common practice in the Python data library ecosystem:
- Pandas: https://pandas.pydata.org/docs/reference/frame.html
- NumPy: https://numpy.org/doc/1.24/reference/arrays.ndarray.html
This could be considered less of a must-have now that we've reduced the index pages for e.g. the Dataset class to be a table-list of single-line method summaries and link-outs.
This PR is ready for review again. cc @ericl, @maxpumperla, @clarkzinzow, @jianoaix and @richardliaw. Thanks.
After this PR, every method now has a sidebar entry. This seems a bit much: could we hide these entries from the sidebar, or not use headings?
@ericl - this seems to be default behavior, and I don't find an easy way to hide by default. One can hide manually by click the upper arrow though. Is it the hard requirement to hide the sidebar by default?
I guess it's fine.
@c21 I think the sidebar behaviour can be considered a feature (we could still change it later). One of the major points here is to introduce another hierarchy. If a users drills down 3 levels to learn about our API, I don't see the issue with having granular info in the sidebar.