polars
polars copied to clipboard
Improve polars.Expr.arr.get
Problem description
Allow to return multiple indexes at once from one array. This would return an array. Maybe also supporting negative indexes.
ex.
col("foo").arr.get([0,2])
it should return values at indexes 0 and 2, ignoring index 1
Now the hard part:
let's say that the foo array is length 2. this means that index 2 doesn't exist.
should it return:
[val, NULL]
or
[val]
I think that it should return [val] (without null), because I'm not sure that if it doesn't it will be possible to do something like this:
col("foo").arr.get([0,2]).arr.eval(pl.element().struct.field('xpto'))
however further thinking, this may break some compatibility, because get
by default returns an value. should get([0])
return an value or an array with the value inside?
That being said, it wouldn't probably hurt, if implemented in a new function, let's say "polars.Expr.arr.multi_get"
You can use eval
and pl.element().take()
.
With pl.element().take()
you can't access other columns, this means that you can only provide a constant index. But say you have two array columns (A, B) which are actually associated by index and you want to sort the data frame by column A where B should be sorted according to its index association with B.
A great way of achieving this would be: ``` df = df.withColumn(pl.col("A").arr.arg_sort().alias("sorted_index")) df = def.select([pl.col("A").get(pl.col("sorted_index")), pl.col("B").get(pl.col("sorted_index"))])
But currently we don't have `arr.arg_sort()` and `arr.get` expects only a single value.