pangeo-cloud-federation
pangeo-cloud-federation copied to clipboard
Pangeo Community Meeting Working Group Report & Wishlist
Pangeo Community Meeting Working Group Report & Wishlist
At the August 20, 2019 Pangeo Community Meeting we had a brief session to discuss alternatives for managing hubs in the future. Two important ideas that came up are to:
- Refactor to remove the
staging
branch. When a pull request is issued, a temporary mirror of the current environment is created with the changes applied. With this implementation, each hub could have its own branch andstaging
-->production
pull requests would not cause interference among hubs. - Maintain a single well-documented "template" repository for deployments under the pangeo-data github org. All hubs could build off that template repo but live in their own repositories. People are currently overwhelmed by the options for deployment and are looking for a step-by-step install guide. This could possibly live in the zero-to-jupyterhub with kubernetes docs.
Wishlist
Below is a crowd-sourced "anything goes" wishlist from both users and administrators of the hub, quickly categorized:
User convenience
- Secure push / pull integration with GitHub / GitLab
- SSH / SCP / FTP access to your home directory
- Propagate user environments from notebooks to dask workers
- Submit batch jobs rather than interactive notebooks
- Ability for users to have a single S3 bucket that is accessible only to them and a common bucket accessible from the Hub and non-hub machines (e.g. HPCs, laptops, lab servers)
- Possibility to execute line by line within the same cell?
Performance
- Continued development of mixing on demand and spot instances for notebook servers and dask workers.
- Faster spinup/teardown of workers
IT Control
- Secure PVC / home directories
- Expand community of operations folks working in this space
- tutorials on operationalizing your JupyterHub
- Avoid giving users advanced permissions on Kubernetes
- Quotas on User home directories (we're currently giving unlimited NFS storage space!)
- Segregate git-crypt keys or use another system
- Remove staging and test deployment to temporary deployment/release
Generalize
- Defining Pangeo for X - what pieces of pangeo would make sense in other domain sciences
- Abstract out a generic pangeo-cloud deployment from specific deployments
- More turnkey deployments on traditional HPC environments (may need some coordination for HPC centers for deploying hubs etc.)
- Using images built for cloud deployments on HPC, and vice versa
- The Littlest Pangeo JupyterHub
- The Littles BinderHub
Diagnostics
- JupyterLab widgets to track Kubernetes pods
- Log usage statistics and have system to visualize. Track costs at user level.
Thanks to all participants for input! - Yuvi Panda (Project Jupyter) - James Munroe (MUN) - Scott Black (UWRL) - Matt! Rocklin (NVIDIA) - Rodrigo Manzanas (IPCC) - Hillary Scannell (UW) - Shreyas Cholia (LBL) - Tim Crone (Lamont-Doherty Earth Observatory) - Mary Romelfanger (STScI Space Telescope Science Institute) - Friedrich Knuth (UW) - Kirstie Haynie (USGS) - Nicholas Sofroniew (CZI) - Jacob Matuskey (STScI Space Telescope Science Institute)