elyra icon indicating copy to clipboard operation
elyra copied to clipboard

Local pipelines using docker POC

Open kevin-bates opened this issue 3 years ago • 3 comments

This pull request is a proof of concept for what it might take to support using the node's associated container image when running pipelines locally. This functionality may be beneficial to folks where certain container images may offer greater functionality/resources than what's available on the user's desktop directly.

Here are its current limitations and functionality and, again, this is only a proof of concept.

  1. This functionality is not enabled by default. It can be enabled with a configuration option, either on the command line when jupyter lab is started or within a configuration file. To enable on the command line, add --LocalPipelineProcessor.use_docker_image=True. Within a configuration file: c.LocalPipelineProcessor.use_docker_image=True.
  2. By default, containers are removed following the node's completion. You can prevent their removal (typically for debugging purposes) using a similar configuration option: --LocalPipelineProcessor.remove_container=False
  3. For Notebook nodes, the command issued will include an attempt to pip install papermill via a bash command. As a result, the container requires both bash and pip commands.
  4. For Python script nodes the python3 command is issued and assumed to be present within the container's path.
  5. For R script nodes, the Rscript command is issued and assumed to be present within the container's path.
  6. The docker client API is used to process the nodes (when configured). This dependency is lazily bound for now, so you'll need to pip install docker to avoid an import issue.

I've found that using use_docker_image=True adds about 10 seconds per node - probably due to startup time (and that's with all images being local - so would expect further delays, but the cost should be relatively constant).

Should we decide to adopt this as another option for local processing, I would suggest we promote the 'local' processor to be a full-fledged runtime. I believe this will also simplify multiple areas of the code that currently special-case local processing. The 'local' schema would expose these two properties (use_docker_image and remove_container) as runtime-configurable properties, among other things.

Developer's Certificate of Origin 1.1

   By making a contribution to this project, I certify that:

   (a) The contribution was created in whole or in part by me and I
       have the right to submit it under the Apache License 2.0; or

   (b) The contribution is based upon previous work that, to the best
       of my knowledge, is covered under an appropriate open source
       license and I have the right under that license to submit that
       work with modifications, whether created in whole or in part
       by me, under the same open source license (unless I am
       permitted to submit under a different license), as indicated
       in the file; or

   (c) The contribution was provided directly to me by some other
       person who certified (a), (b) or (c) and I have not modified
       it.

   (d) I understand and agree that this project and the contribution
       are public and that a record of the contribution (including all
       personal information I submit with it, including my sign-off) is
       maintained indefinitely and may be redistributed consistent with
       this project or the open source license(s) involved.

kevin-bates avatar Nov 24 '21 00:11 kevin-bates

Thanks for making a pull request to Elyra!

To try out this branch on binder, follow this link: Binder

elyra-bot[bot] avatar Nov 24 '21 00:11 elyra-bot[bot]

Should we decide to adopt this as another option for local processing, I would suggest we promote the 'local' processor to be a full-fledged runtime. I believe this will also simplify multiple areas of the code that currently special-case local processing.

This could also be treated as a new processor instead of a replacement for 'local' and serve as a blueprint for BYO runtime.

ptitzler avatar Nov 24 '21 20:11 ptitzler

I'm removing this out of draft and adding the WIP label since I suspect the draft-mode might be interfering with our ability to discuss this proof-of-concept.

kevin-bates avatar Feb 01 '22 19:02 kevin-bates

We've decided not to promote local as a full runtime - closing.

kevin-bates avatar Oct 26 '22 15:10 kevin-bates