bacalhau
bacalhau copied to clipboard
CLI design
Presently the following format might be quite confusing for some non docker folks because it's format is inspired by kubectl exec
cid=$(ipfs add myscript.py)
bacalhau run \
-v $cid:/myscript.py \
python:3 -- python3 /myscript.py
We should have - as a baseline, explicit flags for each thing:
cid=$(ipfs add myscript.py)
bacalhau run \
-v $cid:/myscript.py \
--image python:3 \
--command "python3 /myscript.py"
Here are some examples of more friendly cli commands:
bacalhau run app.py # Detects python, uses specific python version to run as an image
bacalhau run app.py --params="{JSON BLOB}" # Interprets a json blob as kv to inject as cli args? who knows.
bacalhau run app.py --params=var=foo --params=var1=bar # Same as above, but explicit
bacalhau run app.py --python="some sem ver" # Uses the python base image from library/python:"some sem ver"
bacalhau run app.py --image=foobaz/qaz:1.3.4 # Uses a new base image, as specified, overrides detection
we have:
- named specific flags
- an array of the other arguments
so this:
bacalhau run myscript.py
can be turned into this automatically:
bacalhau run \
--volume $cid:/myscript.py \
--image python:3 \
--command "python3 /myscript.py"
because the cli sees one argument that is named (myscript.py) and so does the following things:
- auto detects it's a python script and checks it can find it on the local file system
- uploads that python script to an ipfs node (potentially via the requestor node)
- turns the python script into a CID mounted volume
- calls the expanded run command above
the point is we always support the explicit mode (using --volume --image and --comand ) but we also have some shortcuts that can get to there
I'm happy to explore this! any thoughts? @js-ts how does this line up with what you've already seen?
More
bacalhau run app.py # IF a requirements.txt is present, install?
bacalhau run app.py # IF a conda.yaml is present, install?
bacalhau run mynotebook.ipynb # Uses SAME to install and run? Picks up same.yaml?
I like having single line commands and users not having to deal with CIDs, input volumes, output volumes etc.
For python specifically I think providing the parameters similar to how parametrs are parsed in python scripts should be done rather modifying the way arguments are provided more intuitive and simialar to the way things are done in a specific language
so in sys.argv it would be like
bacalhau run foo.py 1 2
in python python foo.py 1 2
script foo.py
print("sum:", sys.argv[0]+sys.argv[i])
With argparse we can add flags
bacalhau run hello.py --name Sam
in python python hello.py --name Sam
script hello.py
# Import the library
import argparse
# Create the parser
parser = argparse.ArgumentParser()
# Add an argument
parser.add_argument('--name', type=str, required=True)
# Parse the argument
args = parser.parse_args()
# Print "Hello" + the user input argument
print('Hello,', args.name)
Note the dataset path inside the script matters for it to work for that we need some way to better capture the enviroment for that so that we could upload the dataset as well as get to know where to put the dataset relative to the script
datasets With requirements.txt
bacalhau run foo.py –r=requirements.txt --dataset=['/path/to/datasets','/path/to/datasets']
Installing requirements manually
bacalhau run foo.py –dependencies=[‘pandas’,’numpy’] --dataset=['/path/to/datasets','/path/to/datasets']
Running a project here the language part is important when we run a project as a python project could use cpython, c++ etc
bacalhau run --language=python --project=/path/to/the/project
⬆️ this is Assuming datasets are located inside the project folder
bacalhau run --language=python --project=/path/to/the/project --dataset=['/path/to/datasets','/path/to/datasets']
It will install all the dependencies from the project and run the project
@binocarlos https://docs.google.com/document/d/1-e5VsQoi1Ni6RZf3wkH2GRifCt3NblaQAzNbow_Io6U/edit# I modified things in the comment above
We need to walk a very fine line here - we're not building a python specific runtime BUT we can make a lot of python easier.
@js-ts please add all your findings in how to run and the requirements in that document
Like all the ideas shared so far. My personal preference is for the most verbose/explicit flags initially (--command , --input-volume), then building auto-detection and shortcuts over time based on the most frequent usage patterns (which should be measurable at the cluster level).
@binocarlos we need to account for in python cli args,kwargs,*args,**args,**kwargs
@js-ts i disagree - we should not be doing language specific things. We should allow for laying out functions, by default, in a series and have that default into the execution, but that should not be language specific.
So I propose that to get started - we implement the most explicit version of the CLI arguments as the baseline...
i.e. this style:
bacalhau run \
--concurrency 3 \
--image ubuntu \
--input-volume <cid>:/path1 \
--output-volume <name>:/path2 \
--command "echo hello"
and then as has been mentioned - we can use this thread to decide on what kind of "overlays" we can do to make the UX nicer for specific use cases...
the change to make now then is remove this style:
bacalhau run ubuntu echo hello
and replace it with the verbose version above
ok, so my proposal is that for the above:
bacalhau run ubuntu echo hello
Would execute an arbitrary command (no CID)
and
bacalhau run ubuntu -i CID cat /input/*
would do the same, but against all the files in the /input directory (which would be the default).
Objections?
I'm calling this dead - good learning, but out of date. cc @simonwo anything worth salvaging?