pangeo-cloud-federation icon indicating copy to clipboard operation
pangeo-cloud-federation copied to clipboard

Pangeo Community Meeting Working Group Report & Wishlist

Open scottyhq opened this issue 5 years ago • 0 comments

Pangeo Community Meeting Working Group Report & Wishlist

At the August 20, 2019 Pangeo Community Meeting we had a brief session to discuss alternatives for managing hubs in the future. Two important ideas that came up are to:

  1. Refactor to remove the staging branch. When a pull request is issued, a temporary mirror of the current environment is created with the changes applied. With this implementation, each hub could have its own branch and staging --> production pull requests would not cause interference among hubs.
  2. Maintain a single well-documented "template" repository for deployments under the pangeo-data github org. All hubs could build off that template repo but live in their own repositories. People are currently overwhelmed by the options for deployment and are looking for a step-by-step install guide. This could possibly live in the zero-to-jupyterhub with kubernetes docs.

Wishlist

Below is a crowd-sourced "anything goes" wishlist from both users and administrators of the hub, quickly categorized:

User convenience

  • Secure push / pull integration with GitHub / GitLab
  • SSH / SCP / FTP access to your home directory
  • Propagate user environments from notebooks to dask workers
  • Submit batch jobs rather than interactive notebooks
  • Ability for users to have a single S3 bucket that is accessible only to them and a common bucket accessible from the Hub and non-hub machines (e.g. HPCs, laptops, lab servers)
  • Possibility to execute line by line within the same cell?

Performance

  • Continued development of mixing on demand and spot instances for notebook servers and dask workers.
  • Faster spinup/teardown of workers

IT Control

  • Secure PVC / home directories
  • Expand community of operations folks working in this space
  • tutorials on operationalizing your JupyterHub
  • Avoid giving users advanced permissions on Kubernetes
  • Quotas on User home directories (we're currently giving unlimited NFS storage space!)
  • Segregate git-crypt keys or use another system
  • Remove staging and test deployment to temporary deployment/release

Generalize

  • Defining Pangeo for X - what pieces of pangeo would make sense in other domain sciences
  • Abstract out a generic pangeo-cloud deployment from specific deployments
  • More turnkey deployments on traditional HPC environments (may need some coordination for HPC centers for deploying hubs etc.)
  • Using images built for cloud deployments on HPC, and vice versa
  • The Littlest Pangeo JupyterHub
  • The Littles BinderHub

Diagnostics

  • JupyterLab widgets to track Kubernetes pods
  • Log usage statistics and have system to visualize. Track costs at user level.

Thanks to all participants for input! - Yuvi Panda (Project Jupyter) - James Munroe (MUN) - Scott Black (UWRL) - Matt! Rocklin (NVIDIA) - Rodrigo Manzanas (IPCC) - Hillary Scannell (UW) - Shreyas Cholia (LBL) - Tim Crone (Lamont-Doherty Earth Observatory) - Mary Romelfanger (STScI Space Telescope Science Institute) - Friedrich Knuth (UW) - Kirstie Haynie (USGS) - Nicholas Sofroniew (CZI) - Jacob Matuskey (STScI Space Telescope Science Institute)

scottyhq avatar Aug 26 '19 17:08 scottyhq