vaex Rank in a groupby

Hi all, Does someone know how to calculate the rank of a column by a group?

This would be the pandas equivalent: df.groupby('var1')['var2'].rank(ascending = False, method = 'first') Thanks in advance :)

Jun 05 '20 01:06 jdcaicedo251

I have to say, after going through the pandas documentation, and trying it out in the notebook, I don't understand what this should do. @JovanVeljanoski maybe you understand?

Jun 08 '20 07:06 maartenbreddels

Hi @jdcaicedo251

Vaex groupby currently supports only aggregation functions. The .rank() is not an aggregation function, but in this case would be applied per group. The output would be an expression with len as long as that of the original dataframe.

I believe the functionality to apply a function on the dataframe but per group is part of one of the PRs of @maartenbreddels, I do not recall exactly why it was not merged. Perhaps it was part of some wider functionality.

Hey @maartenbreddels, the .rank() method is kind of like .argsort() with the difference that you get to decide how to deal with ties: you can take the average value of the sorted index, the highest, lowest, might be some other options as well. For example ranking [5, 1, 2, 2], will give an output [4, 1, 3, 3] assuming the method to deal with ties is set to be the lowest rank. It would be nice if we could support this indeed (as a method outside of groupby).

Jun 16 '20 18:06 JovanVeljanoski

Thank you! Implementing .rank would be very useful.

Jun 18 '20 04:06 jdcaicedo251

Hi all, Just wanted to know if rank function based on a column is implemented?

Aug 08 '20 03:08 rajeshkumarrs

Any update on that?

Sep 08 '21 10:09 felixnext

Hello, Any updates on adding a rank function to calculate rank on groups?

Aug 13 '22 14:08 prasadchikane

vaex vaex copied to clipboard

Rank in a groupby

vaex
vaex copied to clipboard