Daft icon indicating copy to clipboard operation
Daft copied to clipboard

Syntactic sugar for nested getting in column names

Open jaychia opened this issue 1 year ago • 4 comments

Is your feature request related to a problem? Please describe.

When retrieving nested columns in structs, we currently rely on the Expression.struct.get(...) accessor. However, for deeply nested structs this may get extremely verbose.

Instead, a proposed solution might be to simply use . delimiters in the column itself. For example:

df = df.with_column("nested_bar", df["foo.bar"])

jaychia avatar Mar 08 '24 19:03 jaychia

👀 @kevinzwang

samster25 avatar Mar 08 '24 20:03 samster25

Yeah this is a good idea. Could be applicable to list accessors too(?). When do we want to get this done? Deriving expressions from column names is something we'll eventually get around to in selector expressions, so I'm wondering if it'll make sense to think about these two things together.

Would also want to make sure it doesn't conflict with selector expression syntax since foo.bar could also be interpreted as a regex

kevinzwang avatar Mar 08 '24 20:03 kevinzwang

@kevinzwang I think this should much simpler than the selector expressions that we talked about since col(a.b.c) will always refer to exactly 1 column. Whereas selector expressions can refer to many.

samster25 avatar Mar 09 '24 06:03 samster25

@kevinzwang to sync with @samster25 on this issue

jaychia avatar Apr 23 '24 19:04 jaychia