ibis
ibis copied to clipboard
feat: create a table of literal values
I'd like to be able to create a table from literal values. This can be handy for small utility tables that I might like to join to. It's also handy for test scripts to avoid having to create physical tables when not needed.
One possible way to implement this would be with the UNNEST
operator (https://github.com/ibis-project/ibis/issues/1146), though there may be other ways of accomplishing this.
Expected Ibis usage
With UNNEST
support, I'd expect it to be used in combination with an array literal or struct literals.
table = ibis.unnest(
ibis.array(
[
ibis.struct(dict(col1='hello', col2=1), type='struct<col1: string, col2: integer>'),
ibis.struct(dict(col1='world', col2=2), type='struct<col1: string, col2: integer>'),
ibis.struct(dict(col1='!', col2=3), type='struct<col1: string, col2: integer>'),
]
)
)
Equivalent BigQuery SQL
Using the UNNEST operator (reference) on an array literal of struct literals results in an anonymous table-valued expression.
SELECT col1, col2
FROM UNNEST(
[
STRUCT<col1 STRING, col2 INT64>("hello", 1),
STRUCT<col1 STRING, col2 INT64>("world", 2),
STRUCT<col1 STRING, col2 INT64>("!", 3)
]
)
-- Optionally, include a table name via the alias feature
AS my_temp_table
Row | col1 | col2 |
---|---|---|
1 | hello | 1 |
2 | world | 2 |
3 | ! | 3 |
Alternative Ibis expression
Alternatively, maybe ibis.table could take an optional values
parameter? In the BigQuery dialect, I'd still expect this to compile to an UNNEST operator.
table = ibis.table(
values=[
ibis.struct(dict(col1='hello', col2=1), type='struct<col1: string, col2: integer>'),
ibis.struct(dict(col1='world', col2=2), type='struct<col1: string, col2: integer>'),
ibis.struct(dict(col1='!', col2=3), type='struct<col1: string, col2: integer>'),
],
)
See also
@tswast Thanks for the request. I agree this is something that is incredibly useful for examples and testing and something that people expect to be able to do.
What do you think about ibis.values
as the API, like the SQL construct?
This would be a special kind of Node
, so dialects would be free to implement however they like, UNNEST
or otherwise.
For the project I'm working on, my ideal scenario would be the same kind of node as a table expression, but a special case for literal values wouldn't be so bad.
For the project I'm working on, my ideal scenario would be the same kind of node as a table expression, but a special case for literal values wouldn't be so bad.
Is there a specific existing node you're thinking of here? TableExpr
s can have different underlying nodes depending on the operation.
I don't think I had an existing node in mind. A coworker was experimenting with Table-Valued-Functions, which if implemented in general could get this working for the BigQuery backend.
I think a more general "values" node makes sense for other backends, as it'd be clearer how to implement in non-SQL backends such as pandas.
I suspect that table-valued-functions will require some breaking changes to the SQL compilers to support custom TableNode instances. Right now, our handling of tables is essentially hard-coded.
However, I would be in favor of a top-level ibis.rows()
API that produces a Rows
node that would be compiled to VALUES
in SQL. The pandas backend can call pd.DataFrame
on the input during execution.