sqlflow
sqlflow copied to clipboard
Make TO RUN component don't bind one SQL engine
From the current TO RUN design doc, the component using environment variable as the context, that caused two problems:
- The component would bind to one SQL engine
- Can not reuse the ability of the SQLFLow compiler tool e.g. feature derivation.
For my initial thoughts, the context should be an object instead of an environment value. I will follow this issue and find a way to solve this problem.
Per discussions in our daily meeting, we should have two kinds of interfaces to help users contribute component:
- Input/Output: an IO interface that users don't need to call various SQL engine API to fetch data, just like
db_generator
: https://github.com/sql-machine-learning/sqlflow/blob/7a14735feb616754e5d4ed293c719e8a4446236c/python/sqlflow_submitter/db.py#L223-L240 - submitter: users only need to care about the calculating logic, SQLFLow can submit it to various AI platform.
Agree with this design so that script contributor is not writing functions from scratch, instead, they can just write:
def user_run_function(gen, args, ...):
for row in gen:
# deal with the data
For submitting some job to a distributed platform, this may not work, and in this case, reading data from original SELECT
statement should be managed by the function itself.