pangeo-stacks
pangeo-stacks copied to clipboard
How to use pangeo-stacks images with dask-labextention layout in binder repos?
So I'm trying to work on https://github.com/pangeo-data/pangeo-tutorial-agu-2018/issues/14.
I decided to use pangeo/pangeo-notebook-onbuild:2019.04.19 Docker image as found in several recent Pangeo deployments. This seems to work, however, I've lost the dask-labextension layout, and I'm not sure what I should do.
Looking at https://github.com/pangeo-data/pangeo-cloud-federation/tree/staging/deployments/nasa/image/binder or https://github.com/pangeo-data/pangeo-cloud-federation/tree/staging/deployments/ocean/image/binder, there seems to be some post config files, but no dask-labextension layout.
So what is the correct configuration to use to have a basic Pangeo notebook image with a nice dask-labextension layout?
/cc @ian-r-rose who might know
@guillaumeeb here is a demo I wrote to show how to set up a new layout that works on binder. It takes a bit of work, but is doable. The layout is stored in the jupyterlab-workspace file, which you distribute with the binder.
I don't think our current onbuild setup support the start file syntax. Something that is currently baked into how we use the jupyterlab-workspace features.
Q for @yuvipanda - were there challenges getting the start file entrypoint to work or is this a feature we could implement?
Q for @ian-r-rose - have you heard talk of repo2docker supporting the workspace spec as a known configuration file? This may be an interesting proposal that would eliminate the need for the start file in this use case.
@jhamman I have not heard any talk of that, but it's a neat idea. There is currently no formal spec for workspace files (though it would be nice to have one), so it would be up to the user to provide a well-formed one for their particular binder setup. But it would certainly help in cutting down on the boilerplate start script flimflam (which, as we have seen, is pretty error-prone)
@jhamman we can totally support 'start' in onbuild. I didn't implement it mostly to get an MVP out fast. The way to do that would be:
- Implement our own Entrypoint that is called all the time
- If we have a custom
startfile, it'll call that. If not, it'll just fall back to the default command being called.
Basically, we need to re-implement https://github.com/jupyter/repo2docker/blob/master/repo2docker/buildpacks/repo2docker-entrypoint in r2d_overlay.py.
IMO, the bug is possibly in workspaces needing base_url. See https://github.com/jupyterlab/jupyterlab/issues/5977 for more details. Changing that would fix the start related issues, and also make this much more robust in a lot of use cases. Based on https://github.com/jupyterlab/jupyterlab/issues/5977#issuecomment-465864078 it's unclear why it is needed :)
Hey.. I am having a hack at this See https://github.com/scollis/pangeo-stacks/blob/addstart/onbuild/r2d_overlay.py#L112
One thing I don't understand (I am a docker noob) is where to put it in here https://github.com/pangeo-data/pangeo-stacks/blob/4c90b98836c66403ab81ca837ce979ec9628a232/onbuild/Dockerfile#L15
is it ENTRYPOINT RUN /usr/local/bin/r2d_overlay.py start
@scollis something like that! One addition to your start script would be to make sure it works when there's no 'start' script present. In that case, it should default to calling /usr/local/bin/repo2docker-entrypoint (https://github.com/jupyter/repo2docker/blob/master/repo2docker/buildpacks/base.py#L182) which will default to what repo2docker does.
You should probably also just pass the path directly instead of passing it as an arg to /bin/bash.
Thank you for working on this!
@yuvipanda if a start script is present should it run it and then run /usr/local/bin/repo2docker-entrypoint
@scollis I think it should only run repo2docker-entrypoint if a start script is not present...
Awesome.. I am at ORNL and just about to leave.. Pushing a docker image to dockerhub now.. once I am back at the hotel I dont think the wifi can handle a 30GB upload :D
@yuvipanda "You should probably also just pass the path directly instead of passing it as an arg to /bin/bash."
I am copying what is done in postbuild..
so you are saying I should do
#Enable additional actions in the future
applicators = [apply_start]
for applicator in applicators:
commands = applicator()
if commands:
for command in commands:
subprocess.check_call(
[ command], preexec_fn=applicator._pre_exec
)
@become(NB_UID)
def apply_start():
st_path = binder_path('start')
if os.path.exists(st_path):
return [
f'chmod +x {st_path}',
# since pb_path is a fully qualified path, no need to add a ./
f'{st_path}'
]