xarray
xarray copied to clipboard
DOC: from examples to tutorials
It's awesome to see the work we did at Scipy2019 finally hit the live docs! Thanks @keewis and @dcherian for pushing it through.
Now that we have these more detailed, realistic examples, let's think about how we can take our documentation to the next level. I think we need TUTORIALS. The examples are a good start. I think we can build on these to create tutorials which walk through most of xarray's core features with a domain-specific datasets. We could have different tutorials for different fields. For example.
- Xarray tutorial for meteorology / atmospheric science
- Xarray tutorial for oceanography
- Xarray tutorial for physics (whatever @fujiisoup and @TomNicholas do! 😉 )
- Xarray tutorial for finance (whatever @max-sixty and @crusaderky do! :wink:)
- Xarray tutorial for neuroscience (see nice example from @choldgraf: https://predictablynoisy.com/xarray-explore-ieeg)
Each tutorial would cover the same core elements (loading data, indexing, aligning, grouping, computations, plotting, etc.), but using a familiar, real dataset, rather than the generic, made-up ones in our current docs.
Yes, this would be a lot of work, but I think it would have a huge impact. Just raising here for discussion.
xref #2980 #2378 #3131
In case it's helpful for inspiration, we took a similar approach with the MNE-Python package (neuro electrophysiology package):
https://mne.tools/stable/index.html
Maybe there are at least 3 levels in there, actually:
-
Examples - short vignettes that highlight one very specific piece of functionality, key-words for the example should be
ctrl-f
able in the title - Tutorials - in-depth guides through a common part of workflow that xarray wishes to enable, with more explanation and detail
- Domain use-cases - examples of how xarray can facilitate use-cases in particular fields. Probably cover at a high-level many of the steps that multiple tutorials cover in-depth. More for "inspiration and buy-in" than in-depth learning.
Does that make sense?
@rabernat I'm going to be making a simple plasma physics-oriented xarray tutorial to give at a workshop next week.
I was wondering - if we're uploading real data for these, how big can/should the files be? It might affect what dataset I use.
https://www.divio.com/blog/documentation/ might be a useful reference for this?
if we're uploading real data for these, how big can/should the files be? It might affect what dataset I use.
This is a good question. We need the tutorials to be able to run and build within a CI environment. That's the main constraint.
For larger datasets, rather than storing them in github, a good approach is to create an archive on https://zenodo.org/ from which the data can be pulled.
Maybe there are at least 3 levels in there, actually...
The article linked by @keewis is well worth reading in my opinion - it describes a similar breakdown of different types of documentation:
- Tutorials - learning-oriented lessons to get newcomers started,
- How-to guides - goal-oriented series of steps to solve a specific problem,
- Explanation - understanding-oriented discussion providing background and context,
- Reference - information-oriented description of technical machinery.
I think for xarray there is another type, like you suggest @choldgraf:
- Domain use-cases (/inspiration/showing-off) - showcase-oriented examples of groups using xarray in anger to do something cool.
I personally think xarray in general has reference nailed, lots of good explanation, but is generally a bit weaker on tutorials and how-to guides, and doesn't have many examples of domain use-cases.
I have some ideas for how-to's (maybe these should all go in a separate issue?):
- How to migrate from numpy to xarray - Huge numbers of numpy users need to shown exactly what code should be replaced with what, and what they can then stop worrying about.
-
How to apply your own analysis functions - i.e.
apply_ufunc
how-to. The existing documentation on that is more along the lines of an explanation in my opinion, and I've certainly foundapply_ufunc
to have a steep learning curve. - How to organise domain-specific functionality - In-depth guide to various tricks you can pull with accessors, and when you might want to go beyond that. The documentation we have on that only shows a couple of possible approaches.
We need the tutorials to be able to run and build within a CI environment.
So @rabernat for small datasets what might be an appropriate max filesize? I literally have no idea. ~1MB?
a good approach is to create an archive on https://zenodo.org/
I'll look into that.
For larger datasets, rather than storing them in github, a good approach is to create an archive on zenodo.org from which the data can be pulled.
Another note from MNE - we have a "datasets" sub-module that knows how to pull a few datasets from various online repositories (and in different structures). These store in a local folder (by default, ~/mne_data
I believe) and then they get fast-loaded after the first download. Many of the datasets are then stored in online repositories like OSF (https://osf.io/rxvq7/).
For datasets that aren't gigantic it's a pretty nice system. https://mne.tools/stable/overview/datasets_index.html?highlight=datasets
Hello everyone, is this issue still relevant? I could add a domain-use case for oceanography or meteorology, but it seems like that has already been done under
- getting started -> examples -> ROMS Ocean Model Example
- getting started -> examples -> Calculating Seasonal Averages from Time Series of Monthly Means
-
So there's no need to work on domain-use cases for oceanography or meteorology, is that correct?
-
Also, I'd be happy to contribute with something about how to migrate from numpy to xarray, if that is still needed.
Hi @apkrelling thanks for offering to help!
I think we can still add more domain-specific examples for meteorology and oceanography. @rabernat had some plans for this, maybe he can describe them.
how to migrate from numpy to xarray, if that is still needed.
This would be totally great!
Hey everyone !
is there any way to change or reorder month names [ 'DJF' 'JJA' 'MAM' 'SON'] during seasonal grouping? I like to change 'DJF' 'JJA' 'MAM' 'SON' combination and find out winter season Dec+Jan+Feb+Mar=winter season.
Your assistant highly appreciated.
@hafez-ahmad can you ask this question in Discussions? https://github.com/pydata/xarray/discussions
We've started discussing how to reorganize the xarray-tutorial repository here: https://github.com/xarray-contrib/xarray-tutorial/issues/53 . Comments are welcome!
Hi folks,
Just to mention that we've created a short tutorial on xarray which is meant as a gentle intro to folks coming from the malaria genetics field, who mostly have never heard of xarray before. We illustrate xarray first using outputs from a geostatistical model of how insecticide-treated bednets are used in Africa. We then give a couple of brief examples of how we use xarray for genomic data. There's video walkthroughs in French and English:
https://anopheles-genomic-surveillance.github.io/workshop-5/module-1-xarray.html
Please feel free to link to this in the xarray tutorial site if you'd like to :)
In case it's helpful for inspiration, we took a similar approach with the MNE-Python package (neuro electrophysiology package):
https://mne.tools/stable/index.html
Maybe there are at least 3 levels in there, actually:
* **Examples** - short vignettes that highlight one very specific piece of functionality, key-words for the example should be `ctrl-f`able in the title * **Tutorials** - in-depth guides through a common part of workflow that xarray wishes to enable, with more explanation and detail * **Domain use-cases** - examples of how xarray can facilitate use-cases in particular fields. Probably cover at a high-level many of the steps that multiple tutorials cover in-depth. More for "inspiration and buy-in" than in-depth learning.
Does that make sense?
@choldgraf seems like this page is down (https://predictablynoisy.com/xarray-explore-ieeg). Are these examples available elsewhere?
Oops I think the url just changed
https://chrisholdgraf.com/blog/2019/2019-10-22-xarray-neuro/