pangeo-cloud-federation Pangeo Community Meeting Working Group Report & Wishlist

Pangeo Community Meeting Working Group Report & Wishlist

Open scottyhq opened this issue 5 years ago • 0 comments

Pangeo Community Meeting Working Group Report & Wishlist

At the August 20, 2019 Pangeo Community Meeting we had a brief session to discuss alternatives for managing hubs in the future. Two important ideas that came up are to:

Refactor to remove the staging branch. When a pull request is issued, a temporary mirror of the current environment is created with the changes applied. With this implementation, each hub could have its own branch and staging --> production pull requests would not cause interference among hubs.
Maintain a single well-documented "template" repository for deployments under the pangeo-data github org. All hubs could build off that template repo but live in their own repositories. People are currently overwhelmed by the options for deployment and are looking for a step-by-step install guide. This could possibly live in the zero-to-jupyterhub with kubernetes docs.

Wishlist

Below is a crowd-sourced "anything goes" wishlist from both users and administrators of the hub, quickly categorized:

User convenience

Secure push / pull integration with GitHub / GitLab
SSH / SCP / FTP access to your home directory
Propagate user environments from notebooks to dask workers
Submit batch jobs rather than interactive notebooks
Ability for users to have a single S3 bucket that is accessible only to them and a common bucket accessible from the Hub and non-hub machines (e.g. HPCs, laptops, lab servers)
Possibility to execute line by line within the same cell?

Performance

Continued development of mixing on demand and spot instances for notebook servers and dask workers.
Faster spinup/teardown of workers

IT Control

Secure PVC / home directories
Expand community of operations folks working in this space
tutorials on operationalizing your JupyterHub
Avoid giving users advanced permissions on Kubernetes
Quotas on User home directories (we're currently giving unlimited NFS storage space!)
Segregate git-crypt keys or use another system
Remove staging and test deployment to temporary deployment/release

Generalize

Defining Pangeo for X - what pieces of pangeo would make sense in other domain sciences
Abstract out a generic pangeo-cloud deployment from specific deployments
More turnkey deployments on traditional HPC environments (may need some coordination for HPC centers for deploying hubs etc.)
Using images built for cloud deployments on HPC, and vice versa
The Littlest Pangeo JupyterHub
The Littles BinderHub

Diagnostics

JupyterLab widgets to track Kubernetes pods
Log usage statistics and have system to visualize. Track costs at user level.

Thanks to all participants for input! - Yuvi Panda (Project Jupyter) - James Munroe (MUN) - Scott Black (UWRL) - Matt! Rocklin (NVIDIA) - Rodrigo Manzanas (IPCC) - Hillary Scannell (UW) - Shreyas Cholia (LBL) - Tim Crone (Lamont-Doherty Earth Observatory) - Mary Romelfanger (STScI Space Telescope Science Institute) - Friedrich Knuth (UW) - Kirstie Haynie (USGS) - Nicholas Sofroniew (CZI) - Jacob Matuskey (STScI Space Telescope Science Institute)

Aug 26 '19 17:08 scottyhq

pangeo-cloud-federation pangeo-cloud-federation copied to clipboard

Pangeo Community Meeting Working Group Report & Wishlist

Pangeo Community Meeting Working Group Report & Wishlist

Wishlist

User convenience

Performance

IT Control

Generalize

Diagnostics

pangeo-cloud-federation
pangeo-cloud-federation copied to clipboard