sqlflow
sqlflow copied to clipboard
Build a SQLFlow compiler command-line tool
As a compiler, SQLFlow compile extended SQL and generate target code according to execution platform. The output of SQLFlow is some .yml file and .py program. Currently, users can't access these files directly. However, other compilers always generate a target file, like an executable file or a dynamic lib. SQLFlow should also supply a functionality to compile a source program into some middle objects, link these objects into a target, and finally output the target. We consider to build a command-line tool which solve this problem.
Things to be discussed:
- The output format of the
target, I think it can simply a tarball. - How to build the command-line tool? Three proposals are listed below:
- Add a new
sqlflowcommand likesqlflow comiple a.sql.sqlflowsend this command tosqlflowserverand fetch the generated tarball to the user. - Build a docker image, this image include a go command-line binary, some java parser libs, also a python requirement file. This make make the tool package large, but do not rely on SQLFlow server.
- wrap all needed dependencies into an all-in-one package, which can run locally.
- Add a new
If we build a parser command line tool, should we also need a submitter tools to submit the built target?
If we build a parser command line tool, should we also need a submitter tools to submit the built
target?
Actually, we can use kubectl apply -f output.yaml, it's too simple to build a submitter for it.
We need to output both workflow and compiled step like:
sqlflow -workflow a.sql -o a.yamlcompile a SQL program to a workflow.sqlflow -step a.sql -o a.pycompiles a single statement to a Python script.
We need to output both workflow and compiled step like:
sqlflow -workflow a.sql -o a.yamlcompile a SQL program to a workflow.sqlflow -step a.sql -o a.pycompiles a single statement to a Python script.
In my proposal, these files are also generated, but zipped to a tarball. I think one command is enough:
sqlflow compile a.sql -o out.tar.gz
No, both workflow and step is needed. If we change each step to a pure Python code for execution, the command sqlflow -workflow a.sql -o a.yaml contains everything. Yet when we want to just compile one step and debug or test it locally, we need to output the Python script and run it.
If we build a parser command line tool, should we also need a submitter tools to submit the built
target?Actually, we can use
kubectl apply -f output.yaml, it's too simple to build a submitter for it.
If we use kubectl apply -f output.yaml merely to run a workflow, then,
- How to trace the workflow
- I'm not sure where the translated Python code is, is in the tarball or to be generated by the command-line?
If we build a parser command line tool, should we also need a submitter tools to submit the built
target?Actually, we can use
kubectl apply -f output.yaml, it's too simple to build a submitter for it.If we use
kubectl apply -f output.yamlmerely to run a workflow, then,
- How to trace the workflow
- I'm not sure where the translated Python code is, is in the tarball or to be generated by the command-line?
- If we submit the job manually ( of course, this is not the usual case), we can go to argo's control board to trace the steps.
- I think the python code is in the tarball, not generated by the command-line. However, the finally generated code structure is not clear now, as @typhoonzero mentioned, it may be one
.yaml, or a.yamlwith other python code files.
No, both workflow and step is needed. If we change each step to a pure Python code for execution, the command
sqlflow -workflow a.sql -o a.yamlcontains everything. Yet when we want to just compile one step and debug or test it locally, we need to output the Python script and run it.
Previous discussion have mentioned that we may not able to pull all the code in one .yaml file, of course, it is not the final conclusion jet. If the output contains multi-files, we can implement one command, if it just output one single .yaml, we may implement to commands.