sqlflow icon indicating copy to clipboard operation
sqlflow copied to clipboard

Simplify the command parameters in TO RUN statement.

Open brightcoder01 opened this issue 4 years ago • 4 comments

In the TO RUN statement, the first parameter after CMD keyword is the absolute path of a Python program or an executable in the docker image. Please check the following SQL statement. It runs the binning method on the amount column of the table sqlflow_tutorial_sample_binning.

SELECT * FROM sqlflow_tutorial_sample_binning
TO RUN a_docker_registry/sqlflow-run:binning-v0.1.0
CMD
    "/opt/sqlflow/run/binning.py",
    "column=amount",
    "bin_method=log_bucket",
    "bin_num=10"
INTO sqlflow_tutorial_statistics,sqlflow_tutorial_binned_prob,sqlflow_tutorial_binned_cumsum_prob;

The absolute path is too long for the end users. We can simplify it and just keep binning.py. In order to make it, we set up the standard that all the scripts or executables for Runnable should be in the folder /opt/sqlflow/run.

brightcoder01 avatar Aug 06 '20 08:08 brightcoder01

How about TO RUN a_docker_registry/sqlflow-run:v1/binning.py

typhoonzero avatar Aug 06 '20 09:08 typhoonzero

How about TO RUN a_docker_registry/sqlflow-run:v1/binning.py

I think it's a good idea. It's aligned with the model training statement.

brightcoder01 avatar Aug 10 '20 02:08 brightcoder01

Current

SELECT * FROM sqlflow_tutorial_sample_binning
TO RUN a_docker_registry/sqlflow-run:binning-v0.1.0
CMD
    "binning.py",
    "column=amount",
    "bin_method=log_bucket",
    "bin_num=10"
INTO sqlflow_tutorial_statistics,sqlflow_tutorial_binned_prob,sqlflow_tutorial_binned_cumsum_prob;

Proposed from @typhoonzero (More aligned with TO TRAIN statement)

SELECT * FROM sqlflow_tutorial_sample_binning
TO RUN a_docker_registry/sqlflow-run:binning-v0.1.0/binning.py
CMD
    "column=amount",
    "bin_method=log_bucket",
    "bin_num=10"
INTO sqlflow_tutorial_statistics,sqlflow_tutorial_binned_prob,sqlflow_tutorial_binned_cumsum_prob;

Difference: Move the python file name binning.py or executable name from CMD parameter to the tail of docker image name after TO RUN keyword.

brightcoder01 avatar Aug 11 '20 00:08 brightcoder01

Current

SELECT * FROM sqlflow_tutorial_sample_binning
TO RUN a_docker_registry/sqlflow-run:binning-v0.1.0
CMD
    "binning.py",
    "column=amount",
    "bin_method=log_bucket",
    "bin_num=10"
INTO sqlflow_tutorial_statistics,sqlflow_tutorial_binned_prob,sqlflow_tutorial_binned_cumsum_prob;

Proposed from @typhoonzero (More aligned with TO TRAIN statement)

SELECT * FROM sqlflow_tutorial_sample_binning
TO RUN a_docker_registry/sqlflow-run:binning-v0.1.0/binning.py
CMD
    "column=amount",
    "bin_method=log_bucket",
    "bin_num=10"
INTO sqlflow_tutorial_statistics,sqlflow_tutorial_binned_prob,sqlflow_tutorial_binned_cumsum_prob;

Difference: Move the python file name binning.py or executable name from CMD parameter to the tail of docker image name after TO RUN keyword.

In the latter solution, I'm not sure if the leading CMD is appropriate.

weiguoz avatar Aug 11 '20 03:08 weiguoz