sqlflow
sqlflow copied to clipboard
Simplify the command parameters in TO RUN statement.
In the TO RUN
statement, the first parameter after CMD
keyword is the absolute path of a Python program or an executable in the docker image. Please check the following SQL statement. It runs the binning method on the amount
column of the table sqlflow_tutorial_sample_binning
.
SELECT * FROM sqlflow_tutorial_sample_binning
TO RUN a_docker_registry/sqlflow-run:binning-v0.1.0
CMD
"/opt/sqlflow/run/binning.py",
"column=amount",
"bin_method=log_bucket",
"bin_num=10"
INTO sqlflow_tutorial_statistics,sqlflow_tutorial_binned_prob,sqlflow_tutorial_binned_cumsum_prob;
The absolute path is too long for the end users. We can simplify it and just keep binning.py
. In order to make it, we set up the standard that all the scripts or executables for Runnable should be in the folder /opt/sqlflow/run
.
How about TO RUN a_docker_registry/sqlflow-run:v1/binning.py
How about
TO RUN a_docker_registry/sqlflow-run:v1/binning.py
I think it's a good idea. It's aligned with the model training statement.
Current
SELECT * FROM sqlflow_tutorial_sample_binning
TO RUN a_docker_registry/sqlflow-run:binning-v0.1.0
CMD
"binning.py",
"column=amount",
"bin_method=log_bucket",
"bin_num=10"
INTO sqlflow_tutorial_statistics,sqlflow_tutorial_binned_prob,sqlflow_tutorial_binned_cumsum_prob;
Proposed from @typhoonzero (More aligned with TO TRAIN statement)
SELECT * FROM sqlflow_tutorial_sample_binning
TO RUN a_docker_registry/sqlflow-run:binning-v0.1.0/binning.py
CMD
"column=amount",
"bin_method=log_bucket",
"bin_num=10"
INTO sqlflow_tutorial_statistics,sqlflow_tutorial_binned_prob,sqlflow_tutorial_binned_cumsum_prob;
Difference: Move the python file name binning.py
or executable name from CMD parameter to the tail of docker image name after TO RUN keyword.
Current
SELECT * FROM sqlflow_tutorial_sample_binning TO RUN a_docker_registry/sqlflow-run:binning-v0.1.0 CMD "binning.py", "column=amount", "bin_method=log_bucket", "bin_num=10" INTO sqlflow_tutorial_statistics,sqlflow_tutorial_binned_prob,sqlflow_tutorial_binned_cumsum_prob;
Proposed from @typhoonzero (More aligned with TO TRAIN statement)
SELECT * FROM sqlflow_tutorial_sample_binning TO RUN a_docker_registry/sqlflow-run:binning-v0.1.0/binning.py CMD "column=amount", "bin_method=log_bucket", "bin_num=10" INTO sqlflow_tutorial_statistics,sqlflow_tutorial_binned_prob,sqlflow_tutorial_binned_cumsum_prob;
Difference: Move the python file name
binning.py
or executable name from CMD parameter to the tail of docker image name after TO RUN keyword.
In the latter solution, I'm not sure if the leading CMD
is appropriate.