elyra
elyra copied to clipboard
Local pipelines using docker POC
This pull request is a proof of concept for what it might take to support using the node's associated container image when running pipelines locally. This functionality may be beneficial to folks where certain container images may offer greater functionality/resources than what's available on the user's desktop directly.
Here are its current limitations and functionality and, again, this is only a proof of concept.
- This functionality is not enabled by default. It can be enabled with a configuration option, either on the command line when jupyter lab is started or within a configuration file. To enable on the command line, add
--LocalPipelineProcessor.use_docker_image=True
. Within a configuration file:c.LocalPipelineProcessor.use_docker_image=True
. - By default, containers are removed following the node's completion. You can prevent their removal (typically for debugging purposes) using a similar configuration option:
--LocalPipelineProcessor.remove_container=False
- For Notebook nodes, the command issued will include an attempt to
pip install papermill
via abash
command. As a result, the container requires bothbash
andpip
commands. - For Python script nodes the
python3
command is issued and assumed to be present within the container's path. - For R script nodes, the
Rscript
command is issued and assumed to be present within the container's path. - The docker client API is used to process the nodes (when configured). This dependency is lazily bound for now, so you'll need to
pip install docker
to avoid an import issue.
I've found that using use_docker_image=True
adds about 10 seconds per node - probably due to startup time (and that's with all images being local - so would expect further delays, but the cost should be relatively constant).
Should we decide to adopt this as another option for local processing, I would suggest we promote the 'local' processor to be a full-fledged runtime. I believe this will also simplify multiple areas of the code that currently special-case local processing. The 'local' schema would expose these two properties (use_docker_image
and remove_container
) as runtime-configurable properties, among other things.
Developer's Certificate of Origin 1.1
By making a contribution to this project, I certify that:
(a) The contribution was created in whole or in part by me and I
have the right to submit it under the Apache License 2.0; or
(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or
(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.
(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.
Should we decide to adopt this as another option for local processing, I would suggest we promote the 'local' processor to be a full-fledged runtime. I believe this will also simplify multiple areas of the code that currently special-case local processing.
This could also be treated as a new processor instead of a replacement for 'local' and serve as a blueprint for BYO runtime.
I'm removing this out of draft and adding the WIP label since I suspect the draft-mode might be interfering with our ability to discuss this proof-of-concept.
We've decided not to promote local as a full runtime - closing.