sqlflow icon indicating copy to clipboard operation
sqlflow copied to clipboard

Build a SQLFlow compiler command-line tool

Open lhw362950217 opened this issue 5 years ago • 8 comments

As a compiler, SQLFlow compile extended SQL and generate target code according to execution platform. The output of SQLFlow is some .yml file and .py program. Currently, users can't access these files directly. However, other compilers always generate a target file, like an executable file or a dynamic lib. SQLFlow should also supply a functionality to compile a source program into some middle objects, link these objects into a target, and finally output the target. We consider to build a command-line tool which solve this problem.

Things to be discussed:

  1. The output format of the target, I think it can simply a tarball.
  2. How to build the command-line tool? Three proposals are listed below:
    • Add a new sqlflow command like sqlflow comiple a.sql. sqlflow send this command to sqlflowserver and fetch the generated tarball to the user.
    • Build a docker image, this image include a go command-line binary, some java parser libs, also a python requirement file. This make make the tool package large, but do not rely on SQLFlow server.
    • wrap all needed dependencies into an all-in-one package, which can run locally.

lhw362950217 avatar Jun 24 '20 09:06 lhw362950217

If we build a parser command line tool, should we also need a submitter tools to submit the built target?

sneaxiy avatar Jun 24 '20 11:06 sneaxiy

If we build a parser command line tool, should we also need a submitter tools to submit the built target?

Actually, we can use kubectl apply -f output.yaml, it's too simple to build a submitter for it.

lhw362950217 avatar Jun 28 '20 13:06 lhw362950217

We need to output both workflow and compiled step like:

  1. sqlflow -workflow a.sql -o a.yaml compile a SQL program to a workflow.
  2. sqlflow -step a.sql -o a.py compiles a single statement to a Python script.

typhoonzero avatar Jun 29 '20 02:06 typhoonzero

We need to output both workflow and compiled step like:

  1. sqlflow -workflow a.sql -o a.yaml compile a SQL program to a workflow.
  2. sqlflow -step a.sql -o a.py compiles a single statement to a Python script.

In my proposal, these files are also generated, but zipped to a tarball. I think one command is enough:

sqlflow compile a.sql -o out.tar.gz

lhw362950217 avatar Jun 29 '20 02:06 lhw362950217

No, both workflow and step is needed. If we change each step to a pure Python code for execution, the command sqlflow -workflow a.sql -o a.yaml contains everything. Yet when we want to just compile one step and debug or test it locally, we need to output the Python script and run it.

typhoonzero avatar Jun 29 '20 03:06 typhoonzero

If we build a parser command line tool, should we also need a submitter tools to submit the built target?

Actually, we can use kubectl apply -f output.yaml, it's too simple to build a submitter for it.

If we use kubectl apply -f output.yaml merely to run a workflow, then,

  1. How to trace the workflow
  2. I'm not sure where the translated Python code is, is in the tarball or to be generated by the command-line?

weiguoz avatar Jun 29 '20 07:06 weiguoz

If we build a parser command line tool, should we also need a submitter tools to submit the built target?

Actually, we can use kubectl apply -f output.yaml, it's too simple to build a submitter for it.

If we use kubectl apply -f output.yaml merely to run a workflow, then,

  1. How to trace the workflow
  2. I'm not sure where the translated Python code is, is in the tarball or to be generated by the command-line?
  1. If we submit the job manually ( of course, this is not the usual case), we can go to argo's control board to trace the steps.
  2. I think the python code is in the tarball, not generated by the command-line. However, the finally generated code structure is not clear now, as @typhoonzero mentioned, it may be one .yaml, or a .yaml with other python code files.

lhw362950217 avatar Jun 30 '20 00:06 lhw362950217

No, both workflow and step is needed. If we change each step to a pure Python code for execution, the command sqlflow -workflow a.sql -o a.yaml contains everything. Yet when we want to just compile one step and debug or test it locally, we need to output the Python script and run it.

Previous discussion have mentioned that we may not able to pull all the code in one .yaml file, of course, it is not the final conclusion jet. If the output contains multi-files, we can implement one command, if it just output one single .yaml, we may implement to commands.

lhw362950217 avatar Jun 30 '20 01:06 lhw362950217