`struct(col("*"))` creates separate struct columns instead of single struct with all columns
Describe the bug
When using struct(col("*")) in a select operation, the function incorrectly creates a separate struct column for each original column, where each struct contains only one field. Instead of creating a single struct column containing all columns as fields.
To Reproduce
df = daft.from_pydict({
"embeddings": [[1, 2, 3], [4, 5, 6, 7]],
"text": ["hello world", "goodbye universe"]
})
df.select(daft.struct(col("*"))).collect()
╭─────────────────────────────────┬──────────────────────────╮
│ struct ┆ struct │
│ --- ┆ --- │
│ Struct[embeddings: List[Int64]] ┆ Struct[text: Utf8] │
╞═════════════════════════════════╪══════════════════════════╡
│ {embeddings: [1, 2, 3], ┆ {text: hello world, │
│ } ┆ } │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ {embeddings: [4, 5, 6, 7], ┆ {text: goodbye universe, │
│ } ┆ } │
╰─────────────────────────────────┴──────────────────────────╯
(Showing first 2 of 2 rows)
Expected behavior
struct(col("*")) should produce a single column containing a struct with all original columns as fields:
╭─────────────────────────────────────────────────────────────╮
│ struct │
│ --- │
│ Struct[embeddings: List[Int64], text: Utf8] │
╞═════════════════════════════════════════════════════════════╡
│ {embeddings: [1, 2, 3], text: hello world} │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ {embeddings: [4, 5, 6, 7], text: goodbye universe} │
╰─────────────────────────────────────────────────────────────╯
Component(s)
Expressions
Additional context
The issue appears to be incorrect distributivity.
The struct function is being applied to each individual column that matches the wildcard, rather than being applied to the collection of all matching columns.
as a workaround, you can spread out the columns manually, and it works as expected
df.select(daft.struct(*df.columns)).collect()
╭─────────────────────────────────────────────╮
│ struct │
│ --- │
│ Struct[embeddings: List[Int64], text: Utf8] │
╞═════════════════════════════════════════════╡
│ {embeddings: [1, 2, 3], │
│ text:… │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ {embeddings: [4, 5, 6, 7], │
│ te… │
╰─────────────────────────────────────────────╯
(Showing first 2 of 2 rows)
@kevinzwang @universalmind303 What are the next steps here?