awkward icon indicating copy to clipboard operation
awkward copied to clipboard

Examining arrays documentation

Open DraTeots opened this issue 2 years ago • 6 comments

Which documentation?

Tutorials site

What needs to be documented?

Just if such issue could prioritize writing some next part of the documentation.

Always this arises every time uproot/awkward are used and every time it is a pain to find/remember

https://awkward-array.org/how-to-examine-type.html

DraTeots avatar Mar 11 '22 02:03 DraTeots

I've been looking at my inability to write much documentation in the past 2 years and thinking that the best way to fix it is to promise less. The Sphinx/readthedocs site is complete and has all of this information in it; the purpose of the JupyterBooks/Netlify site was to provide more conversational, introductory tutorials. Part of the reason I can never find time to write tutorials here is because I'm presenting live tutorials in various places (13 so far, ranging in lengths from 1 hour to 3 days). These have notebooks with conversational intros, maybe I should just link to them.

So, this is what I've had in mind for the documentation, an idea I've been thinking about in the past few months:

  1. Have the https://awkward-array.org address point to readthedocs, the complete site: what is currently https://awkward-array.readthedocs.io (as a synonym; both would work).
  2. Shut down the Netlify site and the JupyterBooks-to-HTML generation that breaks with every new version of JupyterBooks.
  3. Copy the tutorial-like documentation that does exist into readthedocs, separating the "tutorials" section from the "reference" section as it is separated in the left-bar of the Uproot readthedocs.
  4. Also include links to the notebooks and videos of the live tutorials I've given, and keep that list updated.

What this would drop is the split between two sites (and the third site, the C++ doxygen, which is becoming less and less relevant—it would continue to exist but be downplayed) and that list of promised-but-not-written tutorials on the left-bar of the current Netlify site (awkward-array.org).

I think what's happening is that people click to them, see that they're not written, and then assume that the information is not available, when actually it's just somewhere else. (The top two comments in Hacker News were about the broken and missing documentation. It might have been different if the website channeled them to the Sphinx references and all the completed tutorials.)

As a "user" of documentation, what do you think of this plan?

jpivarski avatar Mar 11 '22 16:03 jpivarski

I absolutely agree that something has to be done about the documentation as I've experienced the exact scenario you mention - "see that they're not written, and then assume that the information is not available, when actually it's just somewhere else".

Merging the tutorials into the existing readthedocs seems like a good idea to me. Including links to the live tutorials as you suggest would be useful too.

dcervenkov avatar Apr 11 '22 18:04 dcervenkov

@jpivarski I don't think we want to lose the "best practices" Netlify content unless we really have no choice. I found it initially tricky to identify best practices, and in the programming space I feel that tutorials are less common than in HEP. The real issue here is writing and updating the docs, and I sympathise (& empathise!) with you on that front. Once my time frees up in the future, I would be happy to work on filling out these docs.

I agree that the C++ docs will be much less important in future. I'd propose that we have single top-level domain (awkward-array.org), and we have two sections as you propose (reference, tutorials). I'd be happy to do most of the work here, particualrly the JupyterBooks side of things. It intersects with my interests in the Jupyter space, and I think it would be useful. On the tutorials side, I think we'd benefit from condensing the https://github.com/jpivarski-talks/ notebooks into a series of examples (with open-data where possible, rather than requiring a git clone), where we can bring in the full JupyterBook suite of features.

My availability for this right now is not high, but in future this will change.

agoose77 avatar Apr 11 '22 18:04 agoose77

I'd be gracious for the help, and also for the different perspective. My problem when writing tutorials is that they start looking like the references I've already written—there's no benefit to a long page describing one function, because those already exist.

For the organization, I still think it's necessary to have both Tutorials (which focus on getting started and best practices) and Reference (long page on one function), but I was thinking of collapsing the two websites into one website, with https://awkward-array.org pointing to it. Doxygen would still exist (and still be generated by Sphinx, as we have it now), but it would be buried. In fact, the growing three-level distinction in interface: end-user (high-level), downstream developer (mid-level), and internal (low-level) could further subdivide the Reference into three sections. This page, for instance, is deeply internal, as are all those pages with underscores, and the AwkwardForth documentation that I intend to move into the common docs. The distinction "this is public API, but not something you, a data analyst, should be getting into" is a hard one to make, and it would be easier if the awkward-array.org landing page was clearly split into such sections.

I point to https://uproot.readthedocs.io/ as an example of what that subdivision could look like on a single readthedocs site (the left-bar is subdivided in a useful way), though Uproot's not a great example because it doesn't have enough tutorials.

As for the technical point of generating tutorials from Jupyter notebooks, this has been a giant hassle. It's supposed to ensure that our documentation is up-to-date by executing it before it goes online, but not all errors are caught in the commands that are supposed to stop the build if executing the notebook fails:

https://github.com/scikit-hep/awkward-1.0/blob/87cf5792e474d6886aaa3baddc9cf2d87d895deb/.ci/azure-doctest-awkward.yml#L81-L93

I would be willing to turn the notebooks into ordinary pages, even though they're not checked, because the check is ineffective, anyway. Reducing this complexity would hopefully free up time to write documentation. (Also note that JupyterBooks isn't as stable as I thought it would be: they're still frequently changing their API in ways that we need to adapt to, particularly in the table of contents YAML.)

On the whole, the documentation would be in better shape if I could stop thinking about it as a book that must be polished and well-organized and let it just be a bunch of responses to questions. When asked, I seem to have no trouble answering a specific question, but then all of these answers become hard to find. At one point (PyHEP 2018), I tried to encourage everyone to direct their questions to StackOverflow because then it would just be a searchable database of solutions, not something that would need to be explicitly organized. (That turned out to not be a good idea because newcomers were not treated well by the StackOverflow community, and random StackOverflow members didn't realize that users asking about Awkward Array need their solutions to be scalable, e.g. why ak.num(array) is better than [len(x) for x in array].)

For data, any of the scikit-hep-testdata files can be accessed through the GitHub "raw data" URL. That HTTP server doesn't support multi-GET, but it's fine for small files. The big files that some of these

https://github.com/orgs/jpivarski-talks/repositories?q=tutorial

are based on are all online, too, some of them in S3, which (if I remember right) does support multi-GET.

jpivarski avatar Apr 11 '22 19:04 jpivarski

I like the uproot example, and might be tempted to take it further. I like the way that NumPy's docs have different sections: https://numpy.org/doc/stable/user/index.html

image image

I struggle with the fact that we have the entire API namespace in the left sidebar, and although it's compounded by internals & the v2 namespace, I still think it's better to move that into a separate API reference like NumPy does.

JupyterBooks has been changing a lot recently (v2), but I hope it is stabilising more now. Equally, we could choose to pin at a particular version and do incremental upgrades (if we don't already?).

Big :+1: on bringing in the AwkwardForth docs too. In itself AwkwardForth is something that we could advertise more as it very much sits on its own outside of Awkward Array in terms of use cases.

RE JupyterBook - leave it with me as a future milestone. I think we can get it to a good place, but right now there are so many things that are going on in Awkward CI that will simplify once v2 is released etc, that it's probably worth holding fire for now?

On the whole, the documentation would be in better shape if I could stop thinking about it as a book that must be polished and well-organized and let it just be a bunch of responses to questions.

Yes, this is tough. I think that the tutorials should do a good-enough job of showcasing how to use Awkward, especially as our API evolves to an improving level of self-consistency and features. That said, the existing Netlify site is useful, and I anticipate being able to further it. If we look at Jupyter Book here too, we can ensure that it actually stays up to date (or at least, that the code that it recommends actually works).

In my mind, tutorials are "analysis workflows", whereas the Netlify equivalent is "snippets".

newcomers were not treated well by the StackOverflow community,

Right, I noticed that a lot of replies just completely ignored the "Awkward" part :/ Thankfully I do think GitHub discussions is a useful solution here. We can signpost to it from the documentation once things are reshuffled.

any of the scikit-hep-testdata files can be accessed through the GitHub "raw data" URL.

Fab!

Perhaps we should create a new Project in the Projects tab?

agoose77 avatar Apr 11 '22 19:04 agoose77

FYI: this is now on the Roadmap: https://github.com/scikit-hep/awkward-1.0/wiki#documentation-revamp

jpivarski avatar Apr 18 '22 19:04 jpivarski