sqlflow icon indicating copy to clipboard operation
sqlflow copied to clipboard

Make TO RUN component don't bind one SQL engine

Open Yancey1989 opened this issue 4 years ago • 2 comments

From the current TO RUN design doc, the component using environment variable as the context, that caused two problems:

  1. The component would bind to one SQL engine
  2. Can not reuse the ability of the SQLFLow compiler tool e.g. feature derivation.

For my initial thoughts, the context should be an object instead of an environment value. I will follow this issue and find a way to solve this problem.

Yancey1989 avatar Jun 11 '20 11:06 Yancey1989

Per discussions in our daily meeting, we should have two kinds of interfaces to help users contribute component:

  1. Input/Output: an IO interface that users don't need to call various SQL engine API to fetch data, just like db_generator: https://github.com/sql-machine-learning/sqlflow/blob/7a14735feb616754e5d4ed293c719e8a4446236c/python/sqlflow_submitter/db.py#L223-L240
  2. submitter: users only need to care about the calculating logic, SQLFLow can submit it to various AI platform.

Yancey1989 avatar Jun 12 '20 02:06 Yancey1989

Agree with this design so that script contributor is not writing functions from scratch, instead, they can just write:

def user_run_function(gen, args, ...):
    for row in gen:
        # deal with the data

For submitting some job to a distributed platform, this may not work, and in this case, reading data from original SELECT statement should be managed by the function itself.

typhoonzero avatar Jun 16 '20 07:06 typhoonzero