python-pachyderm icon indicating copy to clipboard operation
python-pachyderm copied to clipboard

prototype `python_pachyderm.run_like_a_pipeline`

Open albscui opened this issue 4 years ago • 2 comments

An example usage:

--- cell --
def pl_body():
  open("/pfs/big", "r") as f:
  # do stuff with f

run_like_a_pipeline(
  datums=["big:/", "config:/cfgfile1.txt"],
  code=pl_body)
--- output ---
output is in '/data/a1b2c3'
---cell ---
matplotlib.plot("/data/a1b2c3")
--- output ---
<graph>
---

albscui avatar Aug 09 '21 22:08 albscui

Our next goal for this prototype is to get <User> to use this for debugging failed datums; they specifically mentioned debugging failed datums as a sticking point that they're struggling with, and hopefully this will significantly reduce their iteration time when doing it

msteffen avatar Aug 25 '21 15:08 msteffen

Following up with our conversation on this yesterday:

  1. run_like_a_pipeline should, at the minimum, allow you to specify a (pipeline, datum), download the files in that datum, and mount them into a container running locally (also in that local container: /pfs/out should be a bind-mounted tmp dir where you can see the output from processing that datum)
  2. If users specify code=pl_body, then the container in (1) is some bog-standard python container, and the command becomes, essentially python -c <function body>, a la Kubeflow function-based components. Otherwise, we could use the pipeline image, or maybe allow users to specify their own python image and run their code in that.

msteffen avatar Sep 23 '21 01:09 msteffen