datachain
datachain copied to clipboard
Window-function to select subset of the records
In https://github.com/iterative/datachain/pull/515 we have window-functions implemented. However some common use cases still requires a lot of works, for example, selecting a subset of the records with N records in each class may looks like this:
window = func.window(partition_by="signal.class", order_by="sys.rand")
(
dc.mutate(row_number=func.row_number().over(window))
.filter(C("row_number") < 6)
.select_except("row_number")
)
We need to implement an easier way to do this with some helper functions, may be.
See also this comment.
I think IBIS has a range window parameter that help with that (?).