examples icon indicating copy to clipboard operation
examples copied to clipboard

Deploying examples.pyviz.org live notebooks and apps

Open jbednar opened this issue 3 years ago • 2 comments

A project in this repo consists of a directory at the top level containing various notebooks along with an anaconda-project.yml file that specifies the environment plus possibly multiple commands (e.g. "notebooks" or "dashboard") for how to deploy the various artifacts in that project. Launching the deployment associated with each command on our deployment server requires Jean-Luc to:

  1. Choose an appropriate resource profile ("Medium" by default, but "Large" for ship_traffic right now).
  2. Choose a deployment "endpoint name" (really a server name) for that command, such as attractors-notebooks.pyviz.demo.anaconda.com or attractors.pyviz.demo.anaconda.com
  3. Launch the deployment
  4. Rebuild the website page for that project (if needed) to pick up the link to that deployment, looking either for https://servername.pyviz.demo.anaconda.com/notebookbasename for a dashboard or https://servername.pyviz.demo.anaconda.com/notebooks/notebookbasename.ipynb for a notebook.

The deployment process is currently manual and relies on Jean-Luc reading the .yml to know what there is to deploy and for him to remember things like the resource profile and the endpoint name. We'd like to automate this process or at least make it automatable, because it takes human effort and does not always go smoothly.

To deploy a given project, we need to launch some number of servers, each with some number of URLs that could be linked to, using some number of anaconda-project commands. As an example, for the attractors project, we have:

  • 3 .ipynb files
  • 3 .html pages (one per .ipynb)
  • 5 potential endpoints (3 notebooks, 2 panel apps, all launched but not necessarily linked from any page)
  • 3 actual endpoints linked from examples.pyviz (up to one per .html page, either a notebook or a dashboard)
  • 2 servers (1 Jupyter, 1 Panel)
  • 2 commands (one per server)

The actual commands involved here currently have a variety of names, and it's not clear how the commands map to the endpoints provided. To make that clear and consistent, we briefly considered forcing a 1-1 mapping from actual endpoint to server to command (i.e. one command would launch one server providing one endpoint), but doing that would require extra servers, make monitoring more complex, and would use more server resources for little benefit. We also considered whether to have two classes of project, one with such a 1-1 mapping and "everything else" (currently only 4 out of the 40 projects) having a different structure, but that seems difficult to manage.

So to pin it down into something fully consistent that does not rely on what's in Jean-Luc's head, we (Philipp, Jean-Luc, and I) propose:

  1. [x] Assume all resource profiles are "Medium"; only ship_traffic is using Large, and monitoring suggests that it's only using 30% of the memory, so it's safe to scale that by half to make it "Medium" as well. We may eventually want to allow automatic selection of resource profile, but it looks like we don't currently need that.
  2. [ ] Every project has up to two deployment commands. The only allowed commands are notebook and dashboard. Any existing projects with other commands should be updated; e.g. some say notebooks or dashboards, but that diversity is just going to make things more complicated; notebook and dashboard should be sufficient.
  3. [ ] The servername for the dashboard command should be the same as the given project, with underscores replaced by hyphens if needed. (We could consider renaming the project name to use hyphens so that it works as a server name, but that would break existing links, so maybe better to avoid that.)
  4. [ ] The servername for the notebook command should be projectname-notebook, singular as in "Jupyter Notebook", whether or not there are multiple notebooks. Note that the Attractor notebook server is actually attractors-notebooks right now, and if it's too painful to update that, please edit this rule to say it's always plural instead. But we must pick one!
  5. [ ] Ideally, we'd have an automated script that checks which deployment command(s) are present in the anaconda-project.yml file, then launches the deployment (killing any existing deployments by that name?), but if not, at least we could be blindly following these steps with no decisions required. Either way, will need to re-launch every project's deployments.
  6. [ ] After re-launching the deployments, will need to check and probably just rebuild every web page, and check that the web page and its deployments are working.

Whew! But this all does need to be done, as nearly every project currently needs redeploying and to have its web page rebuilt, so it may as well all be made consistent first.

jbednar avatar Aug 27 '21 22:08 jbednar

I now vote for notebook and dashboard as command names (singular) as that is the common case and not unreasonable even if there are multiple notebooks (which is rarer).

One thing about automating deployment: I would still want some declaration in the yaml to say whether or not it should have a live deployment at all, even if everything else can be specified by convention.

jlstevens avatar Aug 30 '21 10:08 jlstevens

Sounds good. In principle, one might want such a declaration to apply to either the dashboard or the notebook target specifically. But that's difficult because anaconda-project commands don't allow user fields, and in practice, I don't think we're likely to write a dashboard if we don't also want to deploy it. So I think it's sufficient to have a global user field enabling or disabling all deployments for that project. Given that we do deploy nearly all projects in practice, such a flag should presumably be opt-out, e.g. no_deployments or auto_deploy: False (defaulting to True).

jbednar avatar Aug 30 '21 14:08 jbednar