jupyterhub-on-hadoop
jupyterhub-on-hadoop copied to clipboard
Manual Installation in air gap environments
The documentation assumes that the cluster can access public internet. This may not be the case in practice. I am not sure if the air-gap installation is in scope for this, but I thought I'd flag it here.
How do people normally handle this? Searching cloudera's documentation I also couldn't find anything about air gap installs.
CDS version of airgap documentation installation here: https://www.cloudera.com/documentation/data-science-workbench/latest/topics/cdsw_install.html
CSD-based installs in an airgapped environment, put the Cloudera Data Science Workbench parcel into a new hosted or local parcel repository, and then configure the Cloudera Manager Server to target this newly-created repository.
Could this be done by targeting a local conda repository with required packages?
It could. Or we could build RPMs (#8), or use conda-pack
to package the environment for transport. There's lots of things that could work, I'm just not sure what's best.
@hussainsultan would creating a parcel solve help solve the software distribution problem?
If that is the case, the easiest way I can find to create one is by using conda-pack. Let me hack something really quick and post it back here.
@sodre creating a parcel will solve this issue for Cloudera managed Hadoop clusters and I am not sure thats the most general answer as @jcrist mentioned. Perhaps, the best answer might be just to document one of the ways for offline install e.g. using conda-pack
to create a tarball and pushing it to edge node etc.
apologies for the delay.