cudf icon indicating copy to clipboard operation
cudf copied to clipboard

[FEA] Use a single integrated documentation solution across components

Open vyasr opened this issue 2 years ago • 4 comments

Is your feature request related to a problem? Please describe. Currently the C++ documentation and Python documentation are managed and published completely separately. The Python documentation uses Sphinx, while the C++ documentation uses doxygen. Sphinx docs make it significantly easier to include documentation beyond API docs (e.g. user guides or detailed topic references), while doxygen is much more focused on API documentation alone. #11475 demonstrates how non-API documentation can be integrated with doxygen docs, but this approach is limited relative to the flexibility that Sphinx supports. Additionally, Sphinx styling is easier to modify due to the large number of available themes and the knobs that can be easily turned for them. It would be nice if all of our documentation for the different language libraries (as well as different components like developer docs and API docs) could be centralized and presented in a unified manner.

Describe the solution you'd like We should consider migrating all of our documentation to use the new OmniVerse documentation system. It provides a single, unified platform for building both C++ and Python documentation into a Sphinx document. It supports the exact sets of documentation that we already use (doxygen for C++ API docs and rST for Python API docs) while also making it easy to add all the extra pages that we might wish (and which already exist for the Python documentation).

Describe alternatives you've considered One oft-cited benefit of our current documentation layout is that it maintains an alignment with pandas documentation. This makes it easier for users to find the corresponding APIs between the two libraries. While migrating to the OmniVerse documentation system would be a great solution for unifying our documentation and providing a layout and style that is very on-brand for NVIDIA tooling, the different styling may cause some dissonance for readers. If we think this is a significant issue (although I don't anticipate this being the case) we could consider using Breathe directly in our Sphinx docs. Breathe is what allows Sphinx docs to talk to doxygen and parse those API docs (it's what the Omniverse documentation system uses under the hood), and we could leverage it directly in our existing Sphinx documentation. This approach would allow us to have a unified approach to documentation while still retaining the pandas-compatible style.

Additional context Migrating all of our documentation -- whether to OmniVerse or to Breathe -- is a large change that will need to be synchronized across all of RAPIDS. It will affect both user- and developer-facing documentation, so the effects should be carefully considered. Moreover, we should expect that the combination of Breathe and the PyData Sphinx theme that we use will have some incompatibilities that will need to be addressed, and Breathe may affect formatting in surprising ways so we'll need to do a thorough review. As such, any effort to modify the cudf documentation in this manner should be viewed as a POC to be demoed to and discussed across all of RAPIDS before any changes are finalized.

One additional minor point: we need to make sure that whatever system we choose supports documentation in Cython files appropriately. This shouldn't be a problem for direct usage of Breathe, but I don't know enough about how the Omniverse documentation system works under the hood to be entirely certain that it doesn't make additional assumptions that we would need relaxed to support Cython docstrings.

vyasr avatar Aug 05 '22 16:08 vyasr

CC @rlratzel @dantegd @harrism @shwina for perspectives from cuGraph/cuML/cuSpatial/RMM. Feel free to tag others as well of course.

vyasr avatar Aug 05 '22 16:08 vyasr

It sounds like this is proposing two things:

  1. Migrating C++ documentation from Doxygen to Sphinx via Breathe
  2. In addition, integrating both Python and C++ documentation via the Omniverse documentation system

I'd defer the decision about (1) to yourself, @GregoryKimball and the other libcudf C++ devs.

Regarding (2), I have several questions, but primarily:

  1. Will it be open source?
  2. Is the primary goal uniform branding across the Python and C++ documentation? If so, does this rule out integrating node-rapids or potentially bindings for other languages?
  3. What changes, if any, are required to the Python code and documentation sources?

Overall, I feel like (1) can be done independently, and in support of, (2). So we could forge ahead with (1) and make a decision about (2) later.

shwina avatar Aug 05 '22 18:08 shwina

Those are all great questions. I'll address those that I can.

My main reservation with your suggestion to move forward with (1) independently is that I don't know to what extent doing (1) helps with (2). It would probably help us iron out any conflicts between our style and Breathe, but I don't know if that will translate to issues that we run into with the Omniverse template. Ultimately all of these docs boil down to playing with project-specific config files and I don't think the Omniverse ones are that similar.

  1. I am not sure whether the Omniverse docs will eventually become open source. I am also not sure whether that's necessarily a requirement for us, since that would just be the tool that we use to build and publish our documentation. Granting that we would prefer an open source tool, I don't think it's a showstopper if it stays closed source.
  2. I would say that the goals are to 1) unify to the greatest extent possible, and 2) improve the quality of the non-API components of the C++ documentation. It would be great to unify Java and JS documentation as well, but I'm not trying to boil the ocean here. I would say that getting C++ onto Sphinx alone would be a marked improvement in our ability to add additional (non-API) types of documentation much more easily. The C++ code could use user guides, for example. In that respect, moving forward with Breathe even without Omniverse could indeed be beneficial.
  3. I think that the main change would be that a lot of what currently goes in conf.py would now go into the repo.toml file. I don't know if it supports everything that we currently use or need.

vyasr avatar Aug 05 '22 18:08 vyasr

I don't think it's a showstopper if it stays closed source.

I'd be very hesitant to replace open-source tooling for something as fundamental as building docs with anything closed-source - especially when we (cuDF/RAPIDS) cannot support the latter. Doc contributions are typically low-hanging fruit for new contributors and I'd love for that to be the case for RAPIDS as well.

shwina avatar Aug 05 '22 19:08 shwina

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] avatar Sep 04 '22 20:09 github-actions[bot]

Having an integrated solution would also be very beneficial as we move towards exposing pylibcudf as a public API. Since pylibcudf functions will all be minimal wrappers around libcudf functions, being able to cross-link libcudf docs from pylibcudf docstrings would be very valuable to help simplify writing those docs.

vyasr avatar Jul 11 '23 19:07 vyasr