sqlflow Make TO RUN component don't bind one SQL engine

Make TO RUN component don't bind one SQL engine

Open Yancey1989 opened this issue 4 years ago • 2 comments

From the current TO RUN design doc, the component using environment variable as the context, that caused two problems:

The component would bind to one SQL engine
Can not reuse the ability of the SQLFLow compiler tool e.g. feature derivation.

For my initial thoughts, the context should be an object instead of an environment value. I will follow this issue and find a way to solve this problem.

Jun 11 '20 11:06 Yancey1989

Per discussions in our daily meeting, we should have two kinds of interfaces to help users contribute component:

Input/Output: an IO interface that users don't need to call various SQL engine API to fetch data, just like db_generator: https://github.com/sql-machine-learning/sqlflow/blob/7a14735feb616754e5d4ed293c719e8a4446236c/python/sqlflow_submitter/db.py#L223-L240
submitter: users only need to care about the calculating logic, SQLFLow can submit it to various AI platform.

Jun 12 '20 02:06 Yancey1989

Agree with this design so that script contributor is not writing functions from scratch, instead, they can just write:

def user_run_function(gen, args, ...):
    for row in gen:
        # deal with the data

For submitting some job to a distributed platform, this may not work, and in this case, reading data from original SELECT statement should be managed by the function itself.

Jun 16 '20 07:06 typhoonzero

sqlflow sqlflow copied to clipboard

Make TO RUN component don't bind one SQL engine

sqlflow
sqlflow copied to clipboard