datachain icon indicating copy to clipboard operation
datachain copied to clipboard

Window-function to select subset of the records

Open dreadatour opened this issue 1 year ago • 1 comments

In https://github.com/iterative/datachain/pull/515 we have window-functions implemented. However some common use cases still requires a lot of works, for example, selecting a subset of the records with N records in each class may looks like this:

window = func.window(partition_by="signal.class", order_by="sys.rand")
(
    dc.mutate(row_number=func.row_number().over(window))
    .filter(C("row_number") < 6)
    .select_except("row_number")
)

We need to implement an easier way to do this with some helper functions, may be.

See also this comment.

dreadatour avatar Oct 19 '24 04:10 dreadatour

I think IBIS has a range window parameter that help with that (?).

shcheklein avatar Oct 19 '24 16:10 shcheklein