python-pachyderm
python-pachyderm copied to clipboard
prototype `python_pachyderm.run_like_a_pipeline`
An example usage:
--- cell --
def pl_body():
open("/pfs/big", "r") as f:
# do stuff with f
run_like_a_pipeline(
datums=["big:/", "config:/cfgfile1.txt"],
code=pl_body)
--- output ---
output is in '/data/a1b2c3'
---cell ---
matplotlib.plot("/data/a1b2c3")
--- output ---
<graph>
---
Our next goal for this prototype is to get <User> to use this for debugging failed datums; they specifically mentioned debugging failed datums as a sticking point that they're struggling with, and hopefully this will significantly reduce their iteration time when doing it
Following up with our conversation on this yesterday:
run_like_a_pipelineshould, at the minimum, allow you to specify a(pipeline, datum), download the files in that datum, and mount them into a container running locally (also in that local container:/pfs/outshould be a bind-mounted tmp dir where you can see the output from processing that datum)- If users specify
code=pl_body, then the container in (1) is some bog-standard python container, and thecommandbecomes, essentiallypython -c <function body>, a la Kubeflow function-based components. Otherwise, we could use the pipeline image, or maybe allow users to specify their own python image and run their code in that.