[Enh]: Add `narwhals.empty_like` or another way to construct an empty Narwals frame with known schema and implementation
We would like to learn about your use case. For example, if this feature is needed to adopt Narwhals in an open source project, could you please enter the link to it below?
https://anam-org.github.io/metaxy/main/
Please describe the purpose of the new feature or describe the problem to solve.
I would like to make an empty Narwhals frame that looks exactly like another one, but doesn't hold any data.
I couldn't find a way to do it naturally, something like:
df = nw.empty_like(another_df)
With Polars, it is possible to pass an empty data structure and a schema when constructing a new DataFrame:
df = pl.DataFrame([], schema=schema)
With Narwhals it would make less sense, because implementations should be aware of the original data source (e.g. with Ibis where to SELECT from). So I imagine with Narwhals we would need to actually use another DataFrame instance that's already valid, hence the .empty_like suggestion.
Or, alternatively, DataFrame.empty().
Suggest a solution if possible.
No response
If you have tried alternatives, please describe them below.
An alternative right now is doing .head(0) on an existing DataFrame.
Additional information that may help us understand your needs.
No response
Thanks for raising @danielgafni
Since #2874, you can do this for eager backends:
import narwhals as nw
schema = nw.Schema({"a": nw.String(), "b": nw.Int8()})
nw.DataFrame.from_dict({}, schema, backend="polars")
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
| shape: (0, 2) |
| ┌─────┬─────┐ |
| │ a ┆ b │ |
| │ --- ┆ --- │ |
| │ str ┆ i8 │ |
| ╞═════╪═════╡ |
| └─────┴─────┘ |
└──────────────────┘
If we wanted to extend this to support lazy backends, I think we could do either or both of the following:
- Add
nw.LazyFrame.from_dict- This would have wider benefits than this use case
- Requiring a
schemawould be needed for the empty case though
- Add
Schema.to_frame- Mentioned in https://github.com/narwhals-dev/narwhals/pull/2874/files#r2223313335
- The
eagerargument has the same quirk as mentioned in (https://github.com/narwhals-dev/narwhals/pull/2895#discussion_r2491382190)
Thanks for the feature request. I would argue this should be achieved via DataFrame.clear(n=0) (see https://github.com/narwhals-dev/narwhals/issues/2890) (or by adding support for Schema.to_frame).
- Yes, I definitely need lazy frames to work.
- I also want to inherit the same backend as the original frame, so constructing a new
LazyFramefrom scratch isn't an option.DataFrame.clear(n=0)looks like what I am looking for.
I am happy to move forward with the PR, I am about to push the support for lazy backends. We will need to think what to do regarding pandas non-nullable backend and n>0 case
What's the idea behind the Pandas support, by the way? I bet it's problematic and causes friction all over the place. Or am I wrong?
It's not like existing Pandas users could switch to Narwhals - the API is different! And it's not like Polars using would start using Pandas through Narwhals... lol.
Is this a historical thing?
I bet it's problematic and causes friction all over the place.
Well you'd be right about that 😂
Is this a historical thing?
This is the kind of use-case where supporting both pandas and polars (+ many more) via narwhals makes sense
- https://plotly.com/blog/chart-smarter-not-harder-universal-dataframe-support/
I bet it's problematic and causes friction all over the place.
Well you'd be right about that 😂
This is the story of Narwhals in a nutshell! 😢