bacalhau icon indicating copy to clipboard operation
bacalhau copied to clipboard

CLI design

Open binocarlos opened this issue 2 years ago • 11 comments

Presently the following format might be quite confusing for some non docker folks because it's format is inspired by kubectl exec

cid=$(ipfs add myscript.py)
bacalhau run \
  -v $cid:/myscript.py \
  python:3 -- python3 /myscript.py

We should have - as a baseline, explicit flags for each thing:

cid=$(ipfs add myscript.py)
bacalhau run \
  -v $cid:/myscript.py \
  --image python:3 \
  --command "python3 /myscript.py"

Here are some examples of more friendly cli commands:

bacalhau run app.py # Detects python, uses specific python version to run as an image
bacalhau run app.py --params="{JSON BLOB}" # Interprets a json blob as kv to inject as cli args? who knows. 
bacalhau run app.py --params=var=foo --params=var1=bar # Same as above, but explicit
bacalhau run app.py --python="some sem ver" # Uses the python base image from library/python:"some sem ver"
bacalhau run app.py --image=foobaz/qaz:1.3.4 # Uses a new base image, as specified, overrides detection

we have:

  • named specific flags
  • an array of the other arguments

so this:

bacalhau run myscript.py

can be turned into this automatically:

bacalhau run \
  --volume $cid:/myscript.py \
  --image python:3 \
  --command "python3 /myscript.py"

because the cli sees one argument that is named (myscript.py) and so does the following things:

  • auto detects it's a python script and checks it can find it on the local file system
  • uploads that python script to an ipfs node (potentially via the requestor node)
  • turns the python script into a CID mounted volume
  • calls the expanded run command above

the point is we always support the explicit mode (using --volume --image and --comand ) but we also have some shortcuts that can get to there

binocarlos avatar Jun 15 '22 18:06 binocarlos

I'm happy to explore this! any thoughts? @js-ts how does this line up with what you've already seen?

aronchick avatar Jun 15 '22 19:06 aronchick

More

bacalhau run app.py # IF a requirements.txt is present, install?
bacalhau run app.py # IF a conda.yaml is present, install?
bacalhau run mynotebook.ipynb # Uses SAME to install and run? Picks up same.yaml?

aronchick avatar Jun 15 '22 19:06 aronchick

I like having single line commands and users not having to deal with CIDs, input volumes, output volumes etc.

For python specifically I think providing the parameters similar to how parametrs are parsed in python scripts should be done rather modifying the way arguments are provided more intuitive and simialar to the way things are done in a specific language

so in sys.argv it would be like bacalhau run foo.py 1 2 in python python foo.py 1 2 script foo.py print("sum:", sys.argv[0]+sys.argv[i])

With argparse we can add flags bacalhau run hello.py --name Sam in python python hello.py --name Sam

script hello.py

# Import the library
import argparse
# Create the parser
parser = argparse.ArgumentParser()
# Add an argument
parser.add_argument('--name', type=str, required=True)
# Parse the argument
args = parser.parse_args()
# Print "Hello" + the user input argument
print('Hello,', args.name)

Note the dataset path inside the script matters for it to work for that we need some way to better capture the enviroment for that so that we could upload the dataset as well as get to know where to put the dataset relative to the script

datasets With requirements.txt bacalhau run foo.py –r=requirements.txt --dataset=['/path/to/datasets','/path/to/datasets']

Installing requirements manually bacalhau run foo.py –dependencies=[‘pandas’,’numpy’] --dataset=['/path/to/datasets','/path/to/datasets']

Running a project here the language part is important when we run a project as a python project could use cpython, c++ etc

bacalhau run --language=python --project=/path/to/the/project ⬆️ this is Assuming datasets are located inside the project folder

bacalhau run --language=python --project=/path/to/the/project --dataset=['/path/to/datasets','/path/to/datasets'] It will install all the dependencies from the project and run the project

js-ts avatar Jun 15 '22 19:06 js-ts

@binocarlos https://docs.google.com/document/d/1-e5VsQoi1Ni6RZf3wkH2GRifCt3NblaQAzNbow_Io6U/edit# I modified things in the comment above

js-ts avatar Jun 15 '22 20:06 js-ts

We need to walk a very fine line here - we're not building a python specific runtime BUT we can make a lot of python easier.

aronchick avatar Jun 15 '22 22:06 aronchick

@js-ts please add all your findings in how to run and the requirements in that document

aronchick avatar Jun 15 '22 22:06 aronchick

Like all the ideas shared so far. My personal preference is for the most verbose/explicit flags initially (--command , --input-volume), then building auto-detection and shortcuts over time based on the most frequent usage patterns (which should be measurable at the cluster level).

wesfloyd avatar Jun 16 '22 13:06 wesfloyd

@binocarlos we need to account for in python cli args,kwargs,*args,**args,**kwargs

js-ts avatar Jun 16 '22 18:06 js-ts

@js-ts i disagree - we should not be doing language specific things. We should allow for laying out functions, by default, in a series and have that default into the execution, but that should not be language specific.

aronchick avatar Jun 17 '22 00:06 aronchick

So I propose that to get started - we implement the most explicit version of the CLI arguments as the baseline...

i.e. this style:

bacalhau run \
  --concurrency 3 \
  --image ubuntu \
  --input-volume <cid>:/path1 \
  --output-volume <name>:/path2 \
  --command "echo hello"

and then as has been mentioned - we can use this thread to decide on what kind of "overlays" we can do to make the UX nicer for specific use cases...

the change to make now then is remove this style:

bacalhau run ubuntu echo hello

and replace it with the verbose version above

binocarlos avatar Jun 18 '22 13:06 binocarlos

ok, so my proposal is that for the above:

bacalhau run ubuntu echo hello

Would execute an arbitrary command (no CID)

and

bacalhau run ubuntu -i CID cat /input/*

would do the same, but against all the files in the /input directory (which would be the default).

Objections?

aronchick avatar Jul 05 '22 22:07 aronchick

I'm calling this dead - good learning, but out of date. cc @simonwo anything worth salvaging?

aronchick avatar Dec 30 '23 17:12 aronchick