ibis icon indicating copy to clipboard operation
ibis copied to clipboard

feat: create a table of literal values

Open tswast opened this issue 2 years ago • 6 comments

I'd like to be able to create a table from literal values. This can be handy for small utility tables that I might like to join to. It's also handy for test scripts to avoid having to create physical tables when not needed.

One possible way to implement this would be with the UNNEST operator (https://github.com/ibis-project/ibis/issues/1146), though there may be other ways of accomplishing this.

Expected Ibis usage

With UNNEST support, I'd expect it to be used in combination with an array literal or struct literals.

table = ibis.unnest(
  ibis.array(
    [
      ibis.struct(dict(col1='hello', col2=1), type='struct<col1: string, col2: integer>'),
      ibis.struct(dict(col1='world', col2=2), type='struct<col1: string, col2: integer>'),
      ibis.struct(dict(col1='!', col2=3), type='struct<col1: string, col2: integer>'),
    ]
  )
)

Equivalent BigQuery SQL

Using the UNNEST operator (reference) on an array literal of struct literals results in an anonymous table-valued expression.

SELECT col1, col2
FROM UNNEST(
  [
    STRUCT<col1 STRING, col2 INT64>("hello", 1),
    STRUCT<col1 STRING, col2 INT64>("world", 2),
    STRUCT<col1 STRING, col2 INT64>("!", 3)
  ]
)
-- Optionally, include a table name via the alias feature
AS my_temp_table
Row col1 col2
1 hello 1
2 world 2
3 ! 3

Alternative Ibis expression

Alternatively, maybe ibis.table could take an optional values parameter? In the BigQuery dialect, I'd still expect this to compile to an UNNEST operator.

table = ibis.table(
  values=[
      ibis.struct(dict(col1='hello', col2=1), type='struct<col1: string, col2: integer>'),
      ibis.struct(dict(col1='world', col2=2), type='struct<col1: string, col2: integer>'),
      ibis.struct(dict(col1='!', col2=3), type='struct<col1: string, col2: integer>'),
  ],
)

See also

tswast avatar Apr 28 '22 16:04 tswast

@tswast Thanks for the request. I agree this is something that is incredibly useful for examples and testing and something that people expect to be able to do.

What do you think about ibis.values as the API, like the SQL construct?

This would be a special kind of Node, so dialects would be free to implement however they like, UNNEST or otherwise.

cpcloud avatar Apr 28 '22 16:04 cpcloud

For the project I'm working on, my ideal scenario would be the same kind of node as a table expression, but a special case for literal values wouldn't be so bad.

tswast avatar Apr 28 '22 16:04 tswast

For the project I'm working on, my ideal scenario would be the same kind of node as a table expression, but a special case for literal values wouldn't be so bad.

Is there a specific existing node you're thinking of here? TableExprs can have different underlying nodes depending on the operation.

cpcloud avatar May 02 '22 15:05 cpcloud

I don't think I had an existing node in mind. A coworker was experimenting with Table-Valued-Functions, which if implemented in general could get this working for the BigQuery backend.

I think a more general "values" node makes sense for other backends, as it'd be clearer how to implement in non-SQL backends such as pandas.

tswast avatar Aug 05 '22 15:08 tswast

I suspect that table-valued-functions will require some breaking changes to the SQL compilers to support custom TableNode instances. Right now, our handling of tables is essentially hard-coded.

cpcloud avatar Aug 05 '22 15:08 cpcloud

However, I would be in favor of a top-level ibis.rows() API that produces a Rows node that would be compiled to VALUES in SQL. The pandas backend can call pd.DataFrame on the input during execution.

cpcloud avatar Aug 05 '22 15:08 cpcloud