xarray icon indicating copy to clipboard operation
xarray copied to clipboard

DOC: from examples to tutorials

Open rabernat opened this issue 4 years ago • 12 comments

It's awesome to see the work we did at Scipy2019 finally hit the live docs! Thanks @keewis and @dcherian for pushing it through.

Now that we have these more detailed, realistic examples, let's think about how we can take our documentation to the next level. I think we need TUTORIALS. The examples are a good start. I think we can build on these to create tutorials which walk through most of xarray's core features with a domain-specific datasets. We could have different tutorials for different fields. For example.

  • Xarray tutorial for meteorology / atmospheric science
  • Xarray tutorial for oceanography
  • Xarray tutorial for physics (whatever @fujiisoup and @TomNicholas do! 😉 )
  • Xarray tutorial for finance (whatever @max-sixty and @crusaderky do! :wink:)
  • Xarray tutorial for neuroscience (see nice example from @choldgraf: https://predictablynoisy.com/xarray-explore-ieeg)

Each tutorial would cover the same core elements (loading data, indexing, aligning, grouping, computations, plotting, etc.), but using a familiar, real dataset, rather than the generic, made-up ones in our current docs.

Yes, this would be a lot of work, but I think it would have a huge impact. Just raising here for discussion.

xref #2980 #2378 #3131

rabernat avatar Nov 22 '19 17:11 rabernat

In case it's helpful for inspiration, we took a similar approach with the MNE-Python package (neuro electrophysiology package):

https://mne.tools/stable/index.html

Maybe there are at least 3 levels in there, actually:

  • Examples - short vignettes that highlight one very specific piece of functionality, key-words for the example should be ctrl-fable in the title
  • Tutorials - in-depth guides through a common part of workflow that xarray wishes to enable, with more explanation and detail
  • Domain use-cases - examples of how xarray can facilitate use-cases in particular fields. Probably cover at a high-level many of the steps that multiple tutorials cover in-depth. More for "inspiration and buy-in" than in-depth learning.

Does that make sense?

choldgraf avatar Nov 22 '19 18:11 choldgraf

@rabernat I'm going to be making a simple plasma physics-oriented xarray tutorial to give at a workshop next week.

I was wondering - if we're uploading real data for these, how big can/should the files be? It might affect what dataset I use.

TomNicholas avatar Dec 03 '19 15:12 TomNicholas

https://www.divio.com/blog/documentation/ might be a useful reference for this?

keewis avatar Dec 13 '19 16:12 keewis

if we're uploading real data for these, how big can/should the files be? It might affect what dataset I use.

This is a good question. We need the tutorials to be able to run and build within a CI environment. That's the main constraint.

For larger datasets, rather than storing them in github, a good approach is to create an archive on https://zenodo.org/ from which the data can be pulled.

rabernat avatar Dec 13 '19 16:12 rabernat

Maybe there are at least 3 levels in there, actually...

The article linked by @keewis is well worth reading in my opinion - it describes a similar breakdown of different types of documentation:

  • Tutorials - learning-oriented lessons to get newcomers started,
  • How-to guides - goal-oriented series of steps to solve a specific problem,
  • Explanation - understanding-oriented discussion providing background and context,
  • Reference - information-oriented description of technical machinery.

I think for xarray there is another type, like you suggest @choldgraf:

  • Domain use-cases (/inspiration/showing-off) - showcase-oriented examples of groups using xarray in anger to do something cool.

I personally think xarray in general has reference nailed, lots of good explanation, but is generally a bit weaker on tutorials and how-to guides, and doesn't have many examples of domain use-cases.


I have some ideas for how-to's (maybe these should all go in a separate issue?):

  • How to migrate from numpy to xarray - Huge numbers of numpy users need to shown exactly what code should be replaced with what, and what they can then stop worrying about.
  • How to apply your own analysis functions - i.e. apply_ufunc how-to. The existing documentation on that is more along the lines of an explanation in my opinion, and I've certainly found apply_ufunc to have a steep learning curve.
  • How to organise domain-specific functionality - In-depth guide to various tricks you can pull with accessors, and when you might want to go beyond that. The documentation we have on that only shows a couple of possible approaches.

We need the tutorials to be able to run and build within a CI environment.

So @rabernat for small datasets what might be an appropriate max filesize? I literally have no idea. ~1MB?

a good approach is to create an archive on https://zenodo.org/

I'll look into that.

TomNicholas avatar Dec 13 '19 17:12 TomNicholas

For larger datasets, rather than storing them in github, a good approach is to create an archive on zenodo.org from which the data can be pulled.

Another note from MNE - we have a "datasets" sub-module that knows how to pull a few datasets from various online repositories (and in different structures). These store in a local folder (by default, ~/mne_data I believe) and then they get fast-loaded after the first download. Many of the datasets are then stored in online repositories like OSF (https://osf.io/rxvq7/).

For datasets that aren't gigantic it's a pretty nice system. https://mne.tools/stable/overview/datasets_index.html?highlight=datasets

choldgraf avatar Dec 13 '19 20:12 choldgraf

Hello everyone, is this issue still relevant? I could add a domain-use case for oceanography or meteorology, but it seems like that has already been done under

  • getting started -> examples -> ROMS Ocean Model Example
  • getting started -> examples -> Calculating Seasonal Averages from Time Series of Monthly Means
  1. So there's no need to work on domain-use cases for oceanography or meteorology, is that correct?

  2. Also, I'd be happy to contribute with something about how to migrate from numpy to xarray, if that is still needed.

apkrelling avatar Apr 01 '21 22:04 apkrelling

Hi @apkrelling thanks for offering to help!

I think we can still add more domain-specific examples for meteorology and oceanography. @rabernat had some plans for this, maybe he can describe them.

how to migrate from numpy to xarray, if that is still needed.

This would be totally great!

dcherian avatar Apr 02 '21 19:04 dcherian

Hey everyone !

is there any way to change or reorder month names [ 'DJF' 'JJA' 'MAM' 'SON'] during seasonal grouping? I like to change 'DJF' 'JJA' 'MAM' 'SON' combination and find out winter season Dec+Jan+Feb+Mar=winter season.

Your assistant highly appreciated.

hafez-ahmad avatar Apr 08 '21 14:04 hafez-ahmad

@hafez-ahmad can you ask this question in Discussions? https://github.com/pydata/xarray/discussions

dcherian avatar Apr 08 '21 14:04 dcherian

We've started discussing how to reorganize the xarray-tutorial repository here: https://github.com/xarray-contrib/xarray-tutorial/issues/53 . Comments are welcome!

dcherian avatar Apr 26 '22 15:04 dcherian

Hi folks,

Just to mention that we've created a short tutorial on xarray which is meant as a gentle intro to folks coming from the malaria genetics field, who mostly have never heard of xarray before. We illustrate xarray first using outputs from a geostatistical model of how insecticide-treated bednets are used in Africa. We then give a couple of brief examples of how we use xarray for genomic data. There's video walkthroughs in French and English:

https://anopheles-genomic-surveillance.github.io/workshop-5/module-1-xarray.html

Please feel free to link to this in the xarray tutorial site if you'd like to :)

alimanfoo avatar Jul 20 '22 09:07 alimanfoo

In case it's helpful for inspiration, we took a similar approach with the MNE-Python package (neuro electrophysiology package):

https://mne.tools/stable/index.html

Maybe there are at least 3 levels in there, actually:

* **Examples** - short vignettes that highlight one very specific piece of functionality, key-words for the example should be `ctrl-f`able in the title

* **Tutorials** - in-depth guides through a common part of workflow that xarray wishes to enable, with more explanation and detail

* **Domain use-cases** - examples of how xarray can facilitate use-cases in particular fields. Probably cover at a high-level many of the steps that multiple tutorials cover in-depth. More for "inspiration and buy-in" than in-depth learning.

Does that make sense?

@choldgraf seems like this page is down (https://predictablynoisy.com/xarray-explore-ieeg). Are these examples available elsewhere?

ddjustina avatar Feb 21 '23 19:02 ddjustina

Oops I think the url just changed

https://chrisholdgraf.com/blog/2019/2019-10-22-xarray-neuro/

choldgraf avatar Feb 21 '23 20:02 choldgraf