narwhals icon indicating copy to clipboard operation
narwhals copied to clipboard

[Enh]: Allow for kwargs in `LazyFrame.collect`

Open FBruzzesi opened this issue 1 year ago • 1 comments

We would like to learn about your use case. For example, if this feature is needed to adopt Narwhals in an open source project, could you please enter the link to it below?

No response

Please describe the purpose of the new feature or describe the problem to solve.

Allow to pass collect arguments to the underlying backend method call. For instance with the current implementation it would not be feasible to run polars with its streaming engine.

Ideally this should be as flexible as possible and not necessarily follow the polars API. Reason for this is that each backend collect-like function allow for different arguments.

Suggest a solution if possible.

I suggest two possible implementations:

  • sklearn-like: pass arguments with a convention such as polars__streaming, polars__engine, dask__optimize_graph and so on

  • engine specific dict:

     def compute(
         self,
         *,
         polars_kwargs: dict[str, Any] | None = None,
         dask_kwargs: dict[str, Any] | None = None,
         <engine_kwargs>: dict[str, Any] | None = None,
         ...
         ):
    

If you have tried alternatives, please describe them below.

No response

Additional information that may help us understand your needs.

No response

FBruzzesi avatar Sep 22 '24 08:09 FBruzzesi

yup agree - for duckdb for example there's a variety of formats you may want to collect into (pyarrow, pandas, python..)

MarcoGorelli avatar Sep 22 '24 12:09 MarcoGorelli

Closing in favor of #1479

FBruzzesi avatar Dec 01 '24 14:12 FBruzzesi

😄 sorry i'd forgotten about this one

MarcoGorelli avatar Dec 01 '24 14:12 MarcoGorelli