ENH, DOC: Add JupyterLite-powered interactive examples for the `pandas` documentation
- [x] follow-up for #60758 and #57896; closes #61060
- [ ] Tests added and passed if fixing a bug or adding a new feature
- [x] All code checks passed.
- [ ] Added type annotations to new arguments/methods/functions.
- [ ] Added an entry in the latest
doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature.
Description
This PR is the first step in a series of PRs to add WASM-based interactive documentation elements for the pandas documentation. Particularly, this relies on JupyterLite and jupyterlite-sphinx.
The interactive REPL added to the website has been incorporated to use the same JupyterLite deployment that jupyterlite-sphinx internally builds, so that we don't build a separate/duplicate deployment.
The TryExamples directive from jupyterlite-sphinx has been added globally to all Numpydoc-processed docstrings with an "Examples" section via a global_enable_try_examples configuration option. The buttons enabled by the directive have been styled accordingly.
Next steps
The follow-up steps after this PR will be to:
- address the mismatch between the versions of
pandasavailable to use from the Pyodide distribution and the version of pandas that users are viewing the documentation for (most likely when we release Pyodide 0.28) - expand interactive documentation elements for the long-form content in the "User Guide" section (this PR just makes the API examples interactive).
See also
- SciPy's interactive docs effort: https://github.com/scipy/scipy/issues/19729, https://github.com/scipy/scipy/pull/20019
- Similarly, NumPy's interactive docs effort: https://github.com/numpy/numpy/pull/26745
- JupyterLite support in
sphinx-gallery: https://sphinx-gallery.github.io/stable/configuration.html#jupyterlite - The
TryExamplesdirective, used to enable API reference examples: https://jupyterlite-sphinx.readthedocs.io/en/latest/directives/try_examples.html - https://github.com/scientific-python/summit-2024/issues/19
The remaining CI failures here look unrelated; I merged main to see if that makes a difference.
Also, seeems like you're adding the terminal to the getting started page. If I'm not missing something, this means that anyone visiting that page will download all the wasm stuff, which I don't think it's great.
I agree with this. With respect to what happens in the docs, it could be the same issue. We don't want people to have to wait for docs pages to load because a bunch of downloading has to happen.
I'd also like to see how this actually looks when rendered. It also seems that we'd need to update a LOT of docs pages if there is going to be a "try it" button for each one of our rendered API pages that works correctly. But maybe I'm missing something.
Thanks for the feedback, @datapythonista and @Dr-Irv!
Also, seeems like you're adding the terminal to the getting started page. If I'm not missing something, this means that anyone visiting that page will download all the wasm stuff, which I don't think it's great.
I agree with this. With respect to what happens in the docs, it could be the same issue. We don't want people to have to wait for docs pages to load because a bunch of downloading has to happen.
This is answered here: https://github.com/pandas-dev/pandas/pull/61061#discussion_r1982048551. No bandwidth consumption occurs on loading the Getting Started page in the docs, as the directive hides it behind a button that a user would need to interact with. Also, the terminal is shared between the website and the docs, as everything is hosted on the same domain at https://pandas.pydata.org/. If someone uses the one on the website, the assets are cached for the one on the docs, shall they use it (and vice versa).
my personal suggestion Was to move all this to a separate repository, publish it in github pages, and then we can link to it from our docs as needed. I think this is a very cool feature, but I think it's still something new that needs much faster iterations than pandas itself, and I don't think updates, fixes... Should happen in this repository.
Yes, I suggested this in the previous PR at https://github.com/pandas-dev/pandas/pull/60758#issuecomment-2674106117. If we are to incorporate this suggestion I'll need someone from the pandas team to coordinate with me to create such a repository from the template at https://github.com/jupyterlite/demo, and I can help maintain it and subsequently move things here.
I'd also like to see how this actually looks when rendered. It also seems that we'd need to update a LOT of docs pages if there is going to be a "try it" button for each one of our rendered API pages that works correctly. But maybe I'm missing something.
Oh, the "Try it" button is enabled across the API reference pages (for all valid Numpydoc-based "Examples" sections for a public API method's docstring) through the global_enable_try_examples configuration option, so it's not required to add it per API example manually. I'm not sure if pandas has a documentation preview on its PRs where we can test it out. You may try out a sample from the NumPy devdocs here: https://numpy.org/devdocs/reference/generated/numpy.strings.replace.html#numpy.strings.replace (the corresponding PR which added this has been linked in the PR description). It's also possible to exclude certain pages through a regex pattern if they don't make sense with being interactive (for example, numpy.distutils doesn't need to be, of course).
/preview
Website preview of this PR available at: https://pandas.pydata.org/preview/pandas-dev/pandas/61061/
Sorry @agriyakhetarpal, my bad, I misunderstood what this was doing. I thought "try examples" meant only the terminal we've got in the getting started page, and clearly it didn't seem worth the added complexity and possible problems to me. Letting users run the API examples is something we wanted for a long time.
I created a new repo pandas-dev/pandas-jupyterlite so we can have the assets and the CI for the interactive terminal. I think it'll make your life easier, and also reduce the maintenance in this repo, which is always appreciated. I gave you and @jtpio permissions there.
I think it'd be good that the warning about reporting bugs points to the new repo. I wouldn't be surprised if we get a decent amount of things not fully supported in wasm. You'll know better than me. It will also help users see if an error has been reported.
Thanks a lot for reconsidering your stance, and no worries about the misunderstanding! I accepted the invitation to collaborate on https://github.com/pandas-dev/pandas-jupyterlite yesterday. I'll initiate work on that repository in a moment, and I agree that issues should be directed towards it – will point to that in the warning text.
I do have to note that this repository would still be limited for use only in the REPL on the Getting Started pages at the moment, and not for the API examples as they use a notebook instead of a REPL.
Currently, all code snippets in Numpydoc-based examples sections are converted to notebooks and assigned UUIDs for filenames, and subsequently included as a part of the built docs. While we can get jupyterlite-sphinx to use an external JupyterLite site instead of it building its own as a part of the documentation builds, there has to be a way to include example snippets from an external site. @jtpio, can there be a way to support the REPL's &code URL parameter for the Notebook interface, too?
To get a preview of the changes here you can run
/previewin a comment as above. The preview is not automatically updated, you'll have to rerun with the/previewcomment after making changes.
Noted!
There is some magic that the examples do, this code needs to be executed in the session for the examples to work:
main/doc/source/conf.py#L373
Yes, this is also something I think @jtpio would have some better context on. Being able to run an example as-is would require executing code as a part of the kernel, which means we need to pre-install pandas into the environment, which in-turn is a long-standing issue for downstream users: https://github.com/jupyterlite/pyodide-kernel/issues/60. The Pyodide kernel doesn't support this, but the Xeus kernel does. Even if we switch to that kernel, we still need an import pandas as pd statement to be added to the notebooks[^1].
I noted this above in my self-review at https://github.com/pandas-dev/pandas/pull/61061#pullrequestreview-2662242008, and contemporary discussions for the same have taken place for this in NumPy and SciPy. The simpler thing would be for pandas to consider changing its policy and allow the examples to be self-contained. However, I lack the knowledge and prior involvement in previous discussions around this area, so I'm not sure about the reason why things are the way they are now.
[^1]: I've been working on allowing notebook modifications for jupyterilte-sphinx lately, which should support this with the next release.
Just a comment for the future, some examples need these files, which I guess should be placed in the wasm container eventually:
main/doc/data
Yes, this should be possible to include – I'll add them. I see that most of these CSV files are needed as a part of the notebooks in the User Guide and not in the API examples, making things easier.
I assumed there is a generic wasm file for all examples, and that clicking on the try out would somehow pass the code to run to it. That's why I was proposing to use an external repo for it. If things are more complex I guess it's on to use whatever the sphinx plugin does.
Good point about the data files. I was thinking they were mostly used for the read_whatever methods. They do in some cases, like here: https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html#pandas.read_excel
But surely worth to not include in the wasm the ones only used in the user guide if they won't be runnable.
Finding a solution to run the imports before the examples is clearly a must. It's surely fine to not run them, but include them in an initial cell in the notebook so users themselves run it.
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.
Yes, this has been on my radar – I'll be returning to this as soon as I can. Thank you!
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.
Closing as stale. If you're interesting in continuing, merge main and happy to reopen.
