vaex icon indicating copy to clipboard operation
vaex copied to clipboard

Rank in a groupby

Open jdcaicedo251 opened this issue 4 years ago • 6 comments

Hi all, Does someone know how to calculate the rank of a column by a group?

This would be the pandas equivalent: df.groupby('var1')['var2'].rank(ascending = False, method = 'first') Thanks in advance :)

jdcaicedo251 avatar Jun 05 '20 01:06 jdcaicedo251

I have to say, after going through the pandas documentation, and trying it out in the notebook, I don't understand what this should do. @JovanVeljanoski maybe you understand?

maartenbreddels avatar Jun 08 '20 07:06 maartenbreddels

Hi @jdcaicedo251

Vaex groupby currently supports only aggregation functions. The .rank() is not an aggregation function, but in this case would be applied per group. The output would be an expression with len as long as that of the original dataframe.

I believe the functionality to apply a function on the dataframe but per group is part of one of the PRs of @maartenbreddels, I do not recall exactly why it was not merged. Perhaps it was part of some wider functionality.

Hey @maartenbreddels, the .rank() method is kind of like .argsort() with the difference that you get to decide how to deal with ties: you can take the average value of the sorted index, the highest, lowest, might be some other options as well. For example ranking [5, 1, 2, 2], will give an output [4, 1, 3, 3] assuming the method to deal with ties is set to be the lowest rank. It would be nice if we could support this indeed (as a method outside of groupby).

JovanVeljanoski avatar Jun 16 '20 18:06 JovanVeljanoski

Thank you! Implementing .rank would be very useful.

jdcaicedo251 avatar Jun 18 '20 04:06 jdcaicedo251

Hi all, Just wanted to know if rank function based on a column is implemented?

rajeshkumarrs avatar Aug 08 '20 03:08 rajeshkumarrs

Any update on that?

felixnext avatar Sep 08 '21 10:09 felixnext

Hello, Any updates on adding a rank function to calculate rank on groups?

prasadchikane avatar Aug 13 '22 14:08 prasadchikane