Daft icon indicating copy to clipboard operation
Daft copied to clipboard

bug: daft.method(unnest=True) returns the original with_column name

Open everettVT opened this issue 2 months ago • 3 comments

Describe the bug

Instead of returning the names of the fields from a struct, the original name is applied to multiple columns yielding an ambiguous column reference error when selecting on "result' or no column name found on "bar'.

To Reproduce

import daft 

@daft.cls()
class Foo:
    def __init__(self, x: str):
         self.x = x

    @daft.method(
        return_dtype=daft.DataType.struct({
            "bar": daft.DataType.string(),
            "some_int": daft.DataType.int64(),
        }),
        unnest=True
    )
    def do_something(self, input: str):
        return {
            "bar": input + self.x,
            "some_int": 3,
        }

foobar = Foo("bar")

df = daft.from_pydict({"input": ["daft is cool"]}).with_column("result", foobar.do_something(daft.col("input")))

df.show()

# Returns 
╭──────────────┬─────────────────┬────────╮
│ input        ┆ result          ┆ result │
│ ---          ┆ ---             ┆ ---    │
│ String       ┆ String          ┆ Int64  │
╞══════════════╪═════════════════╪════════╡
│ daft is cool ┆ daft is coolbar ┆ 3      │
╰──────────────┴─────────────────┴────────╯

Expected behavior

Should return

╭──────────────┬─────────────────┬──────────╮
│ input        ┆ bar             ┆ some_int │
│ ---          ┆ ---             ┆ ---      │
│ String       ┆ String          ┆ Int64    │
╞══════════════╪═════════════════╪══════════╡
│ daft is cool ┆ daft is coolbar ┆ 3        │
╰──────────────┴─────────────────┴──────────╯

Component(s)

Other

Additional context

No response

everettVT avatar Oct 27 '25 01:10 everettVT

@kevinzwang I know you initially implemented this, and the new index based schema resolution. Would you want to take a look into this? I've come across this bug before as well.

universalmind303 avatar Oct 28 '25 16:10 universalmind303

Yeah this is an issue with with_column and unnest, since with_column simply adds an alias to the expression, which is then propagated to all of the children expressions. I would suggest using df.select("*", foobar.do_something(..)) instead for now, but the interaction of unnest with our projection operators is something that I don't have a great story around right now and we should think about. Open to hearing what others think!

kevinzwang avatar Oct 28 '25 18:10 kevinzwang

I was almost wondering if unnest would just use the struct column name as a prefix for nested fields. That way you could run unnested structured outputs multiple times in the same dataframe without collisions.

everettVT avatar Oct 28 '25 19:10 everettVT