DataFrames.jl icon indicating copy to clipboard operation
DataFrames.jl copied to clipboard

Consider adding maximum and minimum

Open bkamins opened this issue 2 years ago • 3 comments

The implementation would be:

maximum(df::AbstractDataFrame, col::ColumnIndex) = df[argmax(df.col), :]
minimum(df::AbstractDataFrame, col::ColumnIndex) = df[argmin(df.col), :]
maximum(gdf::GroupedDataFrame, col::ColumnIndex) = combine(gdf, sdf -> maximum(sdf, col))
minimum(gdf::GroupedDataFrame, col::ColumnIndex) = combine(gdf, sdf -> maximum(sdf, col))

bkamins avatar Jun 20 '22 09:06 bkamins

I have added GroupedDataFrame versions, but maybe they should be dropped and combine(gdf, sdf -> maximum(sdf, col)) should be required explicitly? (as GroupedDataFrame is iterable)?

bkamins avatar Jun 20 '22 09:06 bkamins

also probably adding a kwarg to allow deciding if a single or multiple rows are kept is worth adding (and also view if in case of multiple rows a view or copy should be added).

It gets complicated. I mark it as 1.5 release to let us think about it without rushing 😄.

bkamins avatar Jun 20 '22 09:06 bkamins

This would essentially be equivalent to slice_max in dplyr.

A major difference between the standard maximum method on collections and the one discussed here is that when there are ties the rows that match the maximum on the passed columns would not necessarily be equal on other columns. So it could be safer to return all tied rows by default (with a keyword argument to disable it) to ensure users take this into account.

nalimilan avatar Jun 20 '22 09:06 nalimilan