polars icon indicating copy to clipboard operation
polars copied to clipboard

Improve polars.Expr.arr.get

Open gfmartins opened this issue 1 year ago • 2 comments

Problem description

Allow to return multiple indexes at once from one array. This would return an array. Maybe also supporting negative indexes.

ex. col("foo").arr.get([0,2])

it should return values at indexes 0 and 2, ignoring index 1

Now the hard part:

let's say that the foo array is length 2. this means that index 2 doesn't exist. should it return: [val, NULL] or [val]

I think that it should return [val] (without null), because I'm not sure that if it doesn't it will be possible to do something like this: col("foo").arr.get([0,2]).arr.eval(pl.element().struct.field('xpto'))

however further thinking, this may break some compatibility, because get by default returns an value. should get([0]) return an value or an array with the value inside?

That being said, it wouldn't probably hurt, if implemented in a new function, let's say "polars.Expr.arr.multi_get"

gfmartins avatar Jan 04 '23 16:01 gfmartins

You can use eval and pl.element().take().

ritchie46 avatar Jan 06 '23 07:01 ritchie46

With pl.element().take() you can't access other columns, this means that you can only provide a constant index. But say you have two array columns (A, B) which are actually associated by index and you want to sort the data frame by column A where B should be sorted according to its index association with B.

A great way of achieving this would be: ``` df = df.withColumn(pl.col("A").arr.arg_sort().alias("sorted_index")) df = def.select([pl.col("A").get(pl.col("sorted_index")), pl.col("B").get(pl.col("sorted_index"))])

But currently we don't have `arr.arg_sort()` and `arr.get` expects only a single value.

mzaks avatar Jan 06 '23 08:01 mzaks