data-safe-haven Pulumi: minimal required workspace software

:white_check_mark: Checklist

[x] I have searched open and closed issues for duplicates.
[x] This is a request for a new feature in the Data Safe Haven or an upgrade to an existing feature.
[x] The feature is still missing in the latest version.
[x] I have read through the documentation.
[x] This isn't an open-ended question (open a discussion if it is).

:strawberry: Suggested change

Currently the workspaces have very little installed: R, Python and libraries to interact with databases. What is a minimal useful workspace. Ideally small enough that we don't need to pre-build it (either with current scripts or bureau).

:steam_locomotive: How could this be done?

Aug 24 '23 21:08 jemrobinson

I would really like to do this with bureau, start using it sooner rather than later.

I would also like to keep bureau useful outside of DSH, with a probably smaller DSVM style image. The pragmatic solution might be to add a new SKU to bureau which is DSH_DSVM, adding the desktop, GPU drivers, extra software. That way it can always be migrated elsewhere if we want to, and we don't have to worry about integrating configuration management into the pulumi code just yet.

Aug 30 '23 17:08 JimMadge

I've actually been wondering whether we need a pre-built image at all. Definitely if we use a pre-built image it should be bureau but is a VM with just Python, R and a few other tools good enough? Advantage of a VM that's built on-the-fly is that it's easier to test and should always be up-to-date with the rest of the code.

Aug 31 '23 07:08 jemrobinson

I think the issue with building on demand is it will waste time and energy.

It depends how far the SRD image is from the image it is build on. If we start from a headless image and install X/wayland, a desktop environment, GPU drivers and libraries, programming languages, graphical programs I would guess it would take around half an hour. At least for Mousehole, it was always the majority of the deployment time.

Another way would be to use the bureau workflows to build images on a regular basis (say weekly) and allow for manually dispatching a build.

Aug 31 '23 11:08 JimMadge

Another possibility is to use a Docker container as the source. I was sceptical but @manics says that this works for him.

Sep 05 '23 16:09 jemrobinson

Here's an example of a Ubuntu 22.04 MATE desktop with VNC running in Docker https://github.com/manics/jupyter-guacamole/

docker run -it --rm -p 5901:5901 -eLOCALHOST=no ghcr.io/manics/ubuntu-mate-vnc:main and connect to localhost:5901 with your VNC client

Sep 25 '23 10:09 manics

A good enough for now approach, waiting on Bureau,

Ansible playbook with desired state for workspaces
Ansible runner, ansible-pull or similar to apply configuration to each deployed workspace

Apr 16 '24 15:04 JimMadge

Ansible pull is appealing to me.

We could have cron/systemd timer to run playbooks regularly. That could enforce desired state. It would also be possible to update deployed workspaces by pushing changes to your playbook.

Ansible pull pulls from a git repository. We could not do that and pull from somewhere else. It would be good to have something like the playbook in a git repo or blob inside the TRE. Admins could push to that from outside, and workspaces fetch from inside.

@craddm @jemrobinson thoughts?

I think maybe the sensible solution for now is create a container inside each SRE with the playbook in. And a regular script to fetch that and run ansible.

May 16 '24 12:05 JimMadge

Mounting an Azure container into a VM is relatively easy (doesn't need credentials to mount as NFSv3). Connecting to a container to pull a file will be more complicated (likely to need a ManagedIdentity for the VM resource and to give that identity permissions on the container).

May 16 '24 13:05 jemrobinson

Pushing to a container that is mounted seems like a good solution then.

May 17 '24 09:05 JimMadge

I was thinking about this earlier today and I think it might also work to pull from a non-mounted container that is locked down from public access but available anonymously for private access.

May 17 '24 09:05 jemrobinson

data-safe-haven data-safe-haven copied to clipboard

Pulumi: minimal required workspace software

:white_check_mark: Checklist

:strawberry: Suggested change

:steam_locomotive: How could this be done?

data-safe-haven
data-safe-haven copied to clipboard