jose-reviews [REVIEW]: The University of Toronto Climate Downscaling Workflow: Tools and Resources for Climate Change Impact Analysis

Submitting author: @mikemorris12 (Michael Morris) Repository: https://github.com/mikemorris12/UTCDW_Guidebook/ Branch with paper.md (empty if default branch): master Version: v1.1.1 Editor: @acocac Reviewers: @aulemahal, @brian-rose Archive: 10.5281/zenodo.12785645 Paper kind: learning module

Status

Status badge code:

HTML: <a href="https://jose.theoj.org/papers/2c5db4cc1b1668693b3aabb62501fe58"><img src="https://jose.theoj.org/papers/2c5db4cc1b1668693b3aabb62501fe58/status.svg"></a>
Markdown: [![status](https://jose.theoj.org/papers/2c5db4cc1b1668693b3aabb62501fe58/status.svg)](https://jose.theoj.org/papers/2c5db4cc1b1668693b3aabb62501fe58)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@aulemahal & @brian-rose, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://openjournals.readthedocs.io/en/jose/reviewer_guidelines.html. Any questions/concerns please let @acocac know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

📝 Checklist for @aulemahal

📝 Checklist for @brian-rose

May 02 '24 08:05 editorialbot

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

May 02 '24 08:05 editorialbot

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1017/9781108601269 is OK
- 10.21105/jose.00100 is OK
- 10.1017/9781107588783 is OK
- 10.5194/gmd-9-1937-2016 is OK

MISSING DOIs

- No DOI given, and none found for title: XClim Official Documentation

INVALID DOIs

- None

May 02 '24 08:05 editorialbot

Software report:

github.com/AlDanial/cloc v 1.90  T=0.42 s (576.3 files/s, 421510.0 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
HTML                            41           6668            288          27461
Jupyter Notebook                83              0          88840          17821
JavaScript                      21           4573           4489          16561
CSS                             12            811            115           2904
PO File                         45           1007              0           2284
YAML                             6              8              6           1203
TeX                              2             32              0            494
Markdown                        25            316              0            447
Python                           2            119            281            250
SVG                              5              0              1             29
-------------------------------------------------------------------------------
SUM:                           242          13534          94020          69454
-------------------------------------------------------------------------------

Commit count by author:

   156	mike-morris-codes
    15	Michale Morris

May 02 '24 08:05 editorialbot

Paper file info:

📄 Wordcount for paper.md is 1069

✅ The paper includes a Statement of need section

May 02 '24 08:05 editorialbot

License info:

🔴 License found: Creative Commons Attribution Share Alike 4.0 International (Not OSI approved)

May 02 '24 08:05 editorialbot

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

May 02 '24 08:05 editorialbot

👋 @aulemahal @brian-rose we will conduct the review in this issue.

Please read through the above information and let me know if you have any questions about the review process.

Thank you 🙏

May 02 '24 08:05 acocac

Review checklist for @aulemahal

Conflict of interest

[x] As the reviewer I confirm that I have read the JOSE conflict of interest policy and that there are no conflicts of interest for me to review this work.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSE code of conduct.

General checks

[x] Repository: Is the source for this learning module available at the https://github.com/mikemorris12/UTCDW_Guidebook/?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of a standard license? (OSI-approved for code, Creative Commons for content)
[x] Version: Does the release version given match the repository release?
[x] Authorship: Has the submitting author (@mikemorris12) made visible contributions to the module? Does the full list of authors seem appropriate and complete?

Documentation

[x] A statement of need: Do the authors clearly state the need for this module and who the target audience is?
[x] Installation instructions: Is there a clearly stated list of dependencies?
[x] Usage: Does the documentation explain how someone would adopt the module, and include examples of how to use it?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the module 2) Report issues or problems with the module 3) Seek support

Pedagogy / Instructional design (Work-in-progress: reviewers, please comment!)

[x] Learning objectives: Does the module make the learning objectives plainly clear? (We don't require explicitly written learning objectives; only that they be evident from content and design.)
[x] Content scope and length: Is the content substantial for learning a given topic? Is the length of the module appropriate?
[x] Pedagogy: Does the module seem easy to follow? Does it observe guidance on cognitive load? (working memory limits of 7 +/- 2 chunks of information)
[x] Content quality: Is the writing of good quality, concise, engaging? Are the code components well crafted? Does the module seem complete?
[x] Instructional design: Is the instructional design deliberate and apparent? For example, exploit worked-example effects; effective multi-media use; low extraneous cognitive load.

JOSE paper

[x] Authors: Does the paper.md file include a list of authors with their affiliations?
[x] A statement of need: Does the paper clearly state the need for this module and who the target audience is?
[x] Description: Does the paper describe the learning materials and sequence?
[x] Does it describe how it has been used in the classroom or other settings, and how someone might adopt it?
[x] Could someone else teach with this module, given the right expertise?
[x] Does the paper tell the "story" of how the authors came to develop it, or what their expertise is?
[x] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?

May 24 '24 17:05 aulemahal

@editorialbot commands

Jun 03 '24 15:06 acocac

Hello @acocac, here are the things you can ask me to do:


# List all available commands
@editorialbot commands

# Add to this issue's reviewers list
@editorialbot add @username as reviewer

# Remove from this issue's reviewers list
@editorialbot remove @username from reviewers

# Get a list of all editors's GitHub handles
@editorialbot list editors

# Assign a user as the editor of this submission
@editorialbot assign @username as editor

# Remove the editor assigned to this submission
@editorialbot remove editor

# Remind an author, a reviewer or the editor to return to a review after a 
# certain period of time (supported units days and weeks)
@editorialbot remind @reviewer in 2 weeks

# Adds a checklist for the reviewer using this command
@editorialbot generate my checklist

# Set a value for version
@editorialbot set v1.0.0 as version

# Set a value for branch
@editorialbot set jose-paper as branch

# Set a value for repository
@editorialbot set https://github.com/organization/repo as repository

# Set a value for the archive DOI
@editorialbot set 10.5281/zenodo.6861996 as archive

# Mention the EiCs for the correct track
@editorialbot ping track-eic

# Run checks and provide information on the repository and the paper file
@editorialbot check repository

# Check the references of the paper for missing DOIs
@editorialbot check references

# Generates the pdf paper
@editorialbot generate pdf

# Recommends the submission for acceptance
@editorialbot recommend-accept

# Generates a LaTeX preprint file
@editorialbot generate preprint

# Flag submission with questionable scope
@editorialbot query scope

# Get a link to the complete list of reviewers
@editorialbot list reviewers

# Creates a post-review checklist with editor and authors tasks
@editorialbot create post-review checklist

# Open the review issue
@editorialbot start review

Jun 03 '24 15:06 editorialbot

@editorialbot remind @brian-rose in 1 day

Jun 03 '24 15:06 acocac

Reminder set for @brian-rose in 1 day

Jun 03 '24 15:06 editorialbot

Hi @aulemahal, thanks for going through the review check list. May I ask if you can expand further (add comments) of the unchecked items, particularly in the pedagogy section? This info might be helpful for the authors to implement changes if necessary. Thank you.

Jun 03 '24 15:06 acocac

Hi! My review is still in progress. I have gone through the 3 first chapters of the document, I plan to finish the review this week. Sorry for the delay!

Jun 03 '24 16:06 aulemahal

Hi! My review is still in progress. I have gone through the 3 first chapters of the document, I plan to finish the review this week. Sorry for the delay!

Thanks for the update 🙏

Jun 04 '24 11:06 acocac

:wave: @brian-rose, please update us on how your review is going (this is an automated reminder).

Jun 04 '24 15:06 editorialbot

Review checklist for @brian-rose

Conflict of interest

[x] As the reviewer I confirm that I have read the JOSE conflict of interest policy and that there are no conflicts of interest for me to review this work.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSE code of conduct.

General checks

[x] Repository: Is the source for this learning module available at the https://github.com/mikemorris12/UTCDW_Guidebook/?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of a standard license? (OSI-approved for code, Creative Commons for content)
[x] Version: Does the release version given match the repository release?
[x] Authorship: Has the submitting author (@mikemorris12) made visible contributions to the module? Does the full list of authors seem appropriate and complete?

Documentation

[x] A statement of need: Do the authors clearly state the need for this module and who the target audience is?
[x] Installation instructions: Is there a clearly stated list of dependencies?
[x] Usage: Does the documentation explain how someone would adopt the module, and include examples of how to use it?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the module 2) Report issues or problems with the module 3) Seek support

Pedagogy / Instructional design (Work-in-progress: reviewers, please comment!)

[x] Learning objectives: Does the module make the learning objectives plainly clear? (We don't require explicitly written learning objectives; only that they be evident from content and design.)
[x] Content scope and length: Is the content substantial for learning a given topic? Is the length of the module appropriate?
[x] Pedagogy: Does the module seem easy to follow? Does it observe guidance on cognitive load? (working memory limits of 7 +/- 2 chunks of information)
[x] Content quality: Is the writing of good quality, concise, engaging? Are the code components well crafted? Does the module seem complete?
[x] Instructional design: Is the instructional design deliberate and apparent? For example, exploit worked-example effects; effective multi-media use; low extraneous cognitive load.

JOSE paper

[x] Authors: Does the paper.md file include a list of authors with their affiliations?
[x] A statement of need: Does the paper clearly state the need for this module and who the target audience is?
[x] Description: Does the paper describe the learning materials and sequence?
[x] Does it describe how it has been used in the classroom or other settings, and how someone might adopt it?
[x] Could someone else teach with this module, given the right expertise?
[x] Does the paper tell the "story" of how the authors came to develop it, or what their expertise is?
[x] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?

Jun 07 '24 18:06 brian-rose

A couple of quick issues I noted in a first pass, before taking a deeper dive into the content:

The repository does not have any releases, so it does not match the given v1.0 version number.
One of the references in the JOSE paper is to the documentation for the xclim package. It's a valid link but not a DOI. The suggestions within the xclim docs is to cite their JOSS paper (https://doi.org/10.21105/joss.05415). I think that would be a better choice.

Jun 07 '24 18:06 brian-rose

I have finished going through the guidebook!

From the review checklist, the only remaining "issue" is indeed that there seems to be no versioning on the guidebook repo. Otherwise, I find this document to be quite interesting. It is well written and complete. It is long, but (almost) no parts seemed superfluous. The topic is complex and to learn both the theory and the practical stuff, a user needs to take their time going though the book.

The following comments are not really publication-blocking to me.

The only section I found tedious was the data download. Station stuff is concise enough with ec3 and the usage of siphon on PAVICS is clear too. However, all those custom functions with request for the other dataset make the code look overly complex (too me). I would suggest the following ideas to make this part less cumbersome:

Use intake-esm with Google's CMIP6 catalog. See this doc
Redirect the readers to the HTML interfaces of ESGF and quickly show how to download the netCDFs by hand or by generated wget scripts (or even globus).

All my other comments are on details of the python code. Should I put them elsewhere ? As issues in the original repo ?

I think there is too much usage of pandas. Xarray and Pandas have very similar but different APIs, as to not confuse the readers, I would suggest showing early in the document how to transform DataFrames into Datasets cleanly and only computing and plotting with xarray from then on.
Similarly, dask.array.apply_along_axis is used in many places. I would suggest using xr.apply_ufunc instead (with vectorize=True). Also, the first place it is used (section 3.2), the data isn't even in a dask array.
OpenDAP : It is mentionned that OpenDAP and PAVICS's THREDDS server are limited when it comes to large datasets. That's only true because no chunking is being given to xr.open_dataset. PAVICS suggests some chunking for its principal datasets and for others, chunks='auto' might be ok for what this guidebook tries to show.
For the observations, you could explore using the AHCCD dataset on PAVICS instead of downloading raw station data.
However, this previous suggestion would require some quick explanation on how dask works and how to configure it. Showing to do dask.config.set(num_workers=XX) might save some headaches and machine jamming. A bonus section or links to tutorials about the distributed Client (and its dashboard!) might be useful.
PCIC's download URL is down. Their website has been reorganized and the data is now accessible through an interactive portal (which resonates with my previous point).
xclim.ensembles.create_ensemble can merge datasets along dimensions like time, the input arguments simply need to be given appropriately.
Please use xclim.atmos instead of xclim.indices.
Some cells have long content. Consider making multiline calls, as when reading the generated HTML, it is sometimes nicer if the user doesn't have to scroll left and right.

Jun 14 '24 21:06 aulemahal

Small grammatical fix in the JOSE paper text: in the Guidebook Content section, line 51 of the rendered PDF. Replace

Chapter 2 explains how climate change projections are made and what their limitations.

with

Chapter 2 explains how climate change projections are made and what their limitations are.

Jun 17 '24 14:06 brian-rose

I have finished going through the entire guidebook.

Overall I found the material to be well-organized and readable. The scope seems reasonable relative to the stated needs and audience. It's not a short read! But seems like a valuable teaching resource. Nice work!

Although complete reproducibility of the code examples doesn't seem to be in the list of review requirements, I thought it would be useful to take this opportunity to run through all the content and flag anything that needs updating. I opened a few issues (linked above) describing the problems I encountered. At least one of these issues (with the PCIC data access) was already flagged by @aulemahal. The issue with unit errors in xlim is causing failures in several notebooks.

I tend to agree with @aulemahal about the rather cumbersome data downloads. There is an actively maintained tutorial on accessing CMIP6 data on Google cloud with intake-esm in the Project Pythia CMIP6 Cookbook, perhaps this is helpful.

In terms of checklist items, I left unchecked "Version", "Installation Instructions" and "References" for reasons described above.

Jun 17 '24 19:06 brian-rose

Thanks @brian-rose and @aulemahal for your helpful feedback! Apologies for all of the Python bugs, we've had quite a few users run the code without error so I wonder if there are package version issues that need to be sorted - I'll look into it. @acocac how do we proceed from here? I'm not used to this sort of open review process, so do I start making the revisions right away or will you write an editor's decision like a traditional journal? I'll wait to hear from you before I proceed.

Jun 19 '24 13:06 mikemorris12

Thanks @brian-rose and @aulemahal for your helpful feedback! Apologies for all of the Python bugs, we've had quite a few users run the code without error so I wonder if there are package version issues that need to be sorted - I'll look into it. @acocac how do we proceed from here? I'm not used to this sort of open review process, so do I start making the revisions right away or will you write an editor's decision like a traditional journal? I'll wait to hear from you before I proceed.

Hi @mikemorris12, following the review process in the Submitting a paper to JOSE section, you should:

Implement changes in the JOSE-submission branch.
Respond to reviewer-raised issues (if any are raised) on the submission repository’s issue tracker. Reviewer and editor contributions, like any other contributions, should be acknowledged in the repository i.e. tag @[reviewer-github-handle] in commit messages.
Upon successful completion of the review, merge the JOSE-submission branch into master and release a new version. I suggest Version 1.1.0 due to the minor nature of the comments (e.g. improve the data download using intake-esm). Please refer to semantic versioning 2.0.0 for further info.
Remember to deposit a copy of your (updated) repository with a data-archiving service such as Zenodo or figshare, get a DOI for the archive, and update the review issue thread with the DOI.
Tag me when you complete above steps. Then, I'll suggest the reviewers to revisit their checklists.

Jun 21 '24 08:06 acocac

Hi @acocac, I think I've addressed each of the reviewer’s comments. Here is a line-by-line response:

For @aulemahal ‘s comments:

Data download: Instead of replacing the esgf-pyclient and Google Cloud data access tutorials, I’ve added a third one (Section 3.6) which shows users how to use intake-esm to access data. I’ve also mentioned in the preamble of Section 3.4 (the esgf-pyclient and model analysis tutorial) that these other options exist and are easier to use.
Section 3.4 links to the ESGF web interface for searching for data.
Use of pandas: I agree that xarray is the standard (and superior) package for working with climate data. However, many new users are experienced working with tabular data but have no experience with xarray, which is why we started the tutorial using pandas to manipulate the station data.
All instances of dask.array.apply_along_axis have been replaced with xr.apply_ufunc.
I’ve added mention in Section 3.2 that using chunks=’auto’ allows the user to load larger amounts of data from PAVICS via OpenDAP.
I’ve added instructions on accessing the AHCCD data to Section 3.1.
Section 3.2 links to the xarray and dask documentation on parallelization.
I’ve changed the PCIC data URL to the new one.
I tried using xclim.ensembles.create_ensemble to merge two datasets along the time dimension, but I couldn’t get it to work properly. The resulting dataset had the wrong time values for the latter half of the record. For now, I’ve mentioned that it’s possible, but I can’t add instructions on how to do it since I couldn’t get it to work.
All uses of xclim.indices have been switched to xclim.indicators.atmos.
I’ve shortened many lines which were previously too long and required scrolling.

For @brian-rose ‘s comments:

I’ve fixed the grammatical error and the xclim citation in the manuscript.
I’ve responded to the issues opened on the UTCDW Guidebook repository; they’ve all been closed.
I’ve added Section 3.6 which shows users how to access CMIP6 data using intake-esm.

A question for @acocac is when exactly I should merge the changes into master, create the new release, and deposit the repo in Zenodo. You said, “tag me when you complete the above steps”, but you also said that branch merge and the Zenodo deposit should take place after successful completion of the review. I assume that the reviewers need to revisit their checklists before the review is considered complete. Please advise on how to proceed.

Jun 27 '24 19:06 mikemorris12

A question for @acocac is when exactly I should merge the changes into master, create the new release, and deposit the repo in Zenodo. You said, “tag me when you complete the above steps”, but you also said that branch merge and the Zenodo deposit should take place after successful completion of the review. I assume that the reviewers need to revisit their checklists before the review is considered complete. Please advise on how to proceed.

@mikemorris12 I'll ask the reviewers to confirm they are satisfied and recommend publication. Then, you should complete the final steps incl. branch merge.

Jul 01 '24 18:07 acocac

@brian-rose @aulemahal - can you confirm if the above comments/implementations by the author are ok for you? Would you recommend the work for publication?

Thank you!

Jul 01 '24 19:07 acocac

@editorialbot generate pdf

Jul 01 '24 19:07 brian-rose

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

Jul 01 '24 19:07 editorialbot

@acocac I confirm that I am satisfied with the author's revisions, and I recommend this work for publication in JOSE!

Jul 01 '24 20:07 brian-rose

@acocac I also confirm that I am satisfied. And I recommend this work for publication in JOSE.

Jul 02 '24 19:07 aulemahal