vaex
vaex copied to clipboard
Rank in a groupby
Hi all, Does someone know how to calculate the rank of a column by a group?
This would be the pandas equivalent:
df.groupby('var1')['var2'].rank(ascending = False, method = 'first')
Thanks in advance :)
I have to say, after going through the pandas documentation, and trying it out in the notebook, I don't understand what this should do. @JovanVeljanoski maybe you understand?
Hi @jdcaicedo251
Vaex groupby currently supports only aggregation functions. The .rank()
is not an aggregation function, but in this case would be applied per group. The output would be an expression with len
as long as that of the original dataframe.
I believe the functionality to apply a function on the dataframe but per group is part of one of the PRs of @maartenbreddels, I do not recall exactly why it was not merged. Perhaps it was part of some wider functionality.
Hey @maartenbreddels, the .rank()
method is kind of like .argsort()
with the difference that you get to decide how to deal with ties: you can take the average value of the sorted index, the highest, lowest, might be some other options as well. For example ranking [5, 1, 2, 2]
, will give an output [4, 1, 3, 3]
assuming the method to deal with ties is set to be the lowest rank. It would be nice if we could support this indeed (as a method outside of groupby).
Thank you! Implementing .rank would be very useful.
Hi all, Just wanted to know if rank function based on a column is implemented?
Any update on that?
Hello, Any updates on adding a rank function to calculate rank on groups?