NSIDC-Data-Tutorials icon indicating copy to clipboard operation
NSIDC-Data-Tutorials copied to clipboard

initial commit of cloud tutorial notebook

Open andypbarrett opened this issue 3 years ago • 1 comments

Add tutorial notebook for UWG cloud demo. This works on the 2i2c Openscapes instance

andypbarrett avatar May 19 '22 23:05 andypbarrett

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Binder :point_left: Launch a binder notebook on this branch for commit a543c85c085f0862968f3c6e3d9ab4c82fdc84f3

I will automatically update this comment whenever this PR is modified

Binder :point_left: Launch a binder notebook on this branch for commit 94aa4b049ce4d5d1056b8a35502ada888a3aca00

Binder :point_left: Launch a binder notebook on this branch for commit 0e21514572e1f2a240de2e021b6fb3a8c8369098

Binder :point_left: Launch a binder notebook on this branch for commit 489d8fb2023a0dd05e03bf31545810b8c43d3852

Binder :point_left: Launch a binder notebook on this branch for commit 5477fefd50d7dcbaeb3f5b22ab108a14ef454b3a

Binder :point_left: Launch a binder notebook on this branch for commit 2aa0de56c799d198b001658427d5696c55c429c7

Binder :point_left: Launch a binder notebook on this branch for commit 84d2811e792b862a2507d7a599cb7c260e71b577

Binder :point_left: Launch a binder notebook on this branch for commit 76af0ac8a8f48c8468780965ff63d1f4789fe50d

Binder :point_left: Launch a binder notebook on this branch for commit 5291e92350ffb732a5f784452f90343651867ab1

Binder :point_left: Launch a binder notebook on this branch for commit 58225ab12a2a6c42c09fc3d6875b3dc3c54c695e

Binder :point_left: Launch a binder notebook on this branch for commit e78641a9fb0c9a228103da9a0eb3e6e1dc8e5046

Binder :point_left: Launch a binder notebook on this branch for commit 31b28a3eecec27bf20abadd8a758baa74ec63cfa

github-actions[bot] avatar Mar 21 '23 18:03 github-actions[bot]

@andypbarrett I didn't realize there was a pull request already open on this branch. Hopefully my commits are descriptive enough, but essentially I just updated the notebook to match our tutorial template and changed earthdata code to earthaccess. I haven't added a rendered version of the notebook yet, as I wanted to get your feedback/suggestions first. Thanks!

jroebuck932 avatar Mar 21 '23 18:03 jroebuck932

I think the text, especially in the overview section, could be reworked and reordered. There is quite a bit of repetition. I think repetition is OK where it reinforces concepts but it is not always necessary.

I'm struggling with what the main objective of this notebook is. On the one hand, it demonstrates using earthaccess to search for and access cloud-hosted data. On the other hand, it demonstrates how to open ICESat-2 data in HDF5 format using xarray. The plotting bit is an add-on to round out the workflow. I think it would be helpful to highlight the generic nature of the search and access part. We also need to convey that the user is in the cloud and the data is in the cloud.

I have three possible titles.

  1. "Accessing and working data in the cloud: an example using ICESat-2."
  2. "Accessing and working with ICESat-2 data in the cloud".
  3. "Accessing and working with cloud-hosted ICESat-2 data in the cloud".

The first paragraph needs to be corrected. I had originally put ATL03 but we actually use ATL06).

A suggested first paragraph:

"This notebook demonstrates searching for cloud-hosted ICESat-2 data and directly accessing a Land Ice Height (ATL06) file from an Amazon Compute Cloud (EC2) instance using the earthaccess package. NASA data "in the cloud" are stored in AWS Simple Storage Service (S3) Buckets. Direct Access is an efficient way to work with data stored in an S3 Bucket when you are working in the cloud. Cloud-hosted files can be opened and loaded into memory without the need to download them first. This allows you to take advantage of the scalability and power of cloud computing.

As an example dataset, we use ICESat-2 Land Ice Height (ATL06) over the Juneau Icefield, AK, for March 2003. ICESat-2 data files, including ATL06, are stored in HDF5 format. We demonstrate how to open an HDF5 and access data variables using xarray. Land Ice Heights are then plotted using hvplot."

I have asked the cloud devs for a plain language definition of direct access.

I think the S3 token blurb can be moved to a section that describe earthaccess and why we use it. Maybe all we need to say is that earthaccess manages getting S3 access tokens.

on-prem is jargon. I'd have to look at some earlier discussions but I think NSIDC-DAAC hosted vs cloud-hosted might be a good way to distinguish on-prem from cloud-hosted.

Add "an AWS EC2 instance in region us-west-2" to prerequisites.

Do we need the list of packages after the prerequisites? I forget what the template is. If we need to have packages listed than maybe remove them from learning objectives. However, that seems a better place for them.

In Authenticate section: "Login requires your Earthdata login and password..."

I wonder if it makes more sense for the description/explanation of what the get method returns to be after the command.

Maybe we need a sub-heading for "Searching for cloud-hosted" data.

I can't decide if "spatiotemporal" is too jargony or not. Is "search using a geographic bounding box and date range" better?

I think now that the ICESat-2 data is public Query.hits() will return the count of files found.

We had a discussion in Openscapes about whether we should use granules (NASA-speak) or files. The consensus was to use files. I'm not sure what USO does. It is probably better to be consistent.

I talk about photon clouds in the markdown but the code deals with land ice heights. I need to change this.

andypbarrett avatar Mar 29 '23 04:03 andypbarrett

@andypbarrett @asteiker this is ready for another review when you have time. Andy I have hopefully addressed all your comments. One thing I want to highlight is that I have ensured the terms granules and collections have been used throughout. I chose this instead of files and data sets because earthaccess uses the granules and collections terminology and I was trying to keep it consistent with that. I understand that openscapes have decided on files, and I did briefly raise this issue with Seamus and USO, but in short I think this needs a larger discussion about how and which terminology to use.

jroebuck932 avatar Apr 05 '23 16:04 jroebuck932

@jroebuck932 Thanks for raising this issue! I know that @andypbarrett and I had gone back and forth on granule vs file, as there are pros/cons to using both terms. I think guidance from SCG would be really helpful here so that we can be consistent across all our content at NSIDC. Given that earthaccess utilizes "granule" and "collection", as these are the terms used on the backend within CMR, then I think it's reasonable to stick with those.

asteiker avatar Apr 05 '23 23:04 asteiker

I agree or granules vs files, etc. Consistency is important. I think a glossary would help to explain granules == files and collections == datasets.

andypbarrett avatar Apr 06 '23 17:04 andypbarrett

I think this looks good. A few typos and suggestions. I ran the notebook in 2i2c and had no problems.

Here are my comments, thoughts and suggestions.

Under Learning Objectives: suggest "you" will be able rather than "we".

Prerequisites: Suggest "So you need to create an EC2 instance in the us-west-2 region" Do we have a glossary of cloud terms or an intro to the cloud that explains instances. If not we should explain it here.

Import Packages: "This tutorial requires the following packages:"

"This can be entered using an Environment Variable - the default strategy - or using either a.netrc file or interactively." I am wondering if this is necessary. Could we just say "We use a .netrc strategy. For other strategies see earthaccess.login"

The code cell for Authentication uses "interactive" but we say we will use netrc.

Under Use Direct-Access to... " using a package"

andypbarrett avatar Apr 06 '23 19:04 andypbarrett

@andypbarrett @asteiker have made all the suggested changes. Andy please take another quick look through, and if all looks good, then before merging I will create a rendered version of the notebook to add to the repo.

jroebuck932 avatar Apr 11 '23 22:04 jroebuck932

The only item I see that needs to be changed is to remove the import pprint because it is not used, as far as I can see. I'm happy to merge once that is done.

andypbarrett avatar Apr 12 '23 21:04 andypbarrett

@andypbarrett thank you for making that change. Long story short I am having an issue creating an accurate rendered version of the notebook as there is a bug with the latest version of earthaccess and the S3 access (earthaccess.open) doesn't work. So I am wondering if we want to wait until that is fixed in the new version, before we merge this? I checked with Luis and I think a new version should available next week.

In more detail, the S3 access does work with the previous version of earthaccess, but in the tutorial we state that Query = earthaccess.collection_query().keyword('ICESat-2') will list both cloud and ECS hosted data sets but with the previous version of earthaccess it will only list the cloud hosted data sets.

jroebuck932 avatar Apr 14 '23 15:04 jroebuck932

creating an accurate rendered version of the notebook

@jroebuck932 Is this just a problem on the rendered notebook? I'd rather this not get hung up on another release of earthaccess but I don't know if I fully understand the problem. Perhaps we could merge this without the rendered version first and come back and update with a rendered notebook after this is fixed.

asteiker avatar Apr 17 '23 15:04 asteiker

@asteiker @andypbarrett I have added Readme.md and environment.yml in the notebook folder and also updated the main README.md. I think this is ready for merging, unless you want to take a quick look at the Readme files.

jroebuck932 avatar Apr 19 '23 18:04 jroebuck932

We will merge this with earthaccess v0.5.0 pinned, and will update once v0.5.2 is released, along with tool metadata for the nsidc website, updated main Readme, and we may also want to caveat usability in Binder given that this requires you to be running in-region.

asteiker avatar Apr 20 '23 22:04 asteiker