examples
examples copied to clipboard
Deploying examples.pyviz.org live notebooks and apps
A project in this repo consists of a directory at the top level containing various notebooks along with an anaconda-project.yml file that specifies the environment plus possibly multiple commands (e.g. "notebooks" or "dashboard") for how to deploy the various artifacts in that project. Launching the deployment associated with each command on our deployment server requires Jean-Luc to:
- Choose an appropriate resource profile ("Medium" by default, but "Large" for ship_traffic right now).
- Choose a deployment "endpoint name" (really a server name) for that command, such as
attractors-notebooks.pyviz.demo.anaconda.com
orattractors.pyviz.demo.anaconda.com
- Launch the deployment
- Rebuild the website page for that project (if needed) to pick up the link to that deployment, looking either for
https://
servername.pyviz.demo.anaconda.com/
notebookbasename for a dashboard orhttps://
servername.pyviz.demo.anaconda.com/notebooks/
notebookbasename.ipynb
for a notebook.
The deployment process is currently manual and relies on Jean-Luc reading the .yml to know what there is to deploy and for him to remember things like the resource profile and the endpoint name. We'd like to automate this process or at least make it automatable, because it takes human effort and does not always go smoothly.
To deploy a given project, we need to launch some number of servers, each with some number of URLs that could be linked to, using some number of anaconda-project commands. As an example, for the attractors project, we have:
- 3 .ipynb files
- 3 .html pages (one per .ipynb)
- 5 potential endpoints (3 notebooks, 2 panel apps, all launched but not necessarily linked from any page)
- 3 actual endpoints linked from examples.pyviz (up to one per .html page, either a notebook or a dashboard)
- 2 servers (1 Jupyter, 1 Panel)
- 2 commands (one per server)
The actual commands involved here currently have a variety of names, and it's not clear how the commands map to the endpoints provided. To make that clear and consistent, we briefly considered forcing a 1-1 mapping from actual endpoint to server to command (i.e. one command would launch one server providing one endpoint), but doing that would require extra servers, make monitoring more complex, and would use more server resources for little benefit. We also considered whether to have two classes of project, one with such a 1-1 mapping and "everything else" (currently only 4 out of the 40 projects) having a different structure, but that seems difficult to manage.
So to pin it down into something fully consistent that does not rely on what's in Jean-Luc's head, we (Philipp, Jean-Luc, and I) propose:
- [x] Assume all resource profiles are "Medium"; only ship_traffic is using Large, and monitoring suggests that it's only using 30% of the memory, so it's safe to scale that by half to make it "Medium" as well. We may eventually want to allow automatic selection of resource profile, but it looks like we don't currently need that.
- [ ] Every project has up to two deployment commands. The only allowed commands are
notebook
anddashboard
. Any existing projects with other commands should be updated; e.g. some saynotebooks
ordashboards
, but that diversity is just going to make things more complicated;notebook
anddashboard
should be sufficient. - [ ] The servername for the
dashboard
command should be the same as the given project, with underscores replaced by hyphens if needed. (We could consider renaming the project name to use hyphens so that it works as a server name, but that would break existing links, so maybe better to avoid that.) - [ ] The servername for the
notebook
command should be projectname-notebook
, singular as in "Jupyter Notebook", whether or not there are multiple notebooks. Note that the Attractor notebook server is actuallyattractors-notebooks
right now, and if it's too painful to update that, please edit this rule to say it's always plural instead. But we must pick one! - [ ] Ideally, we'd have an automated script that checks which deployment command(s) are present in the
anaconda-project.yml
file, then launches the deployment (killing any existing deployments by that name?), but if not, at least we could be blindly following these steps with no decisions required. Either way, will need to re-launch every project's deployments. - [ ] After re-launching the deployments, will need to check and probably just rebuild every web page, and check that the web page and its deployments are working.
Whew! But this all does need to be done, as nearly every project currently needs redeploying and to have its web page rebuilt, so it may as well all be made consistent first.
I now vote for notebook
and dashboard
as command names (singular) as that is the common case and not unreasonable even if there are multiple notebooks (which is rarer).
One thing about automating deployment: I would still want some declaration in the yaml to say whether or not it should have a live deployment at all, even if everything else can be specified by convention.
Sounds good. In principle, one might want such a declaration to apply to either the dashboard
or the notebook
target specifically. But that's difficult because anaconda-project commands don't allow user fields, and in practice, I don't think we're likely to write a dashboard if we don't also want to deploy it. So I think it's sufficient to have a global user field enabling or disabling all deployments for that project. Given that we do deploy nearly all projects in practice, such a flag should presumably be opt-out, e.g. no_deployments
or auto_deploy: False
(defaulting to True).