DataFrames.jl
DataFrames.jl copied to clipboard
Consider adding maximum and minimum
The implementation would be:
maximum(df::AbstractDataFrame, col::ColumnIndex) = df[argmax(df.col), :]
minimum(df::AbstractDataFrame, col::ColumnIndex) = df[argmin(df.col), :]
maximum(gdf::GroupedDataFrame, col::ColumnIndex) = combine(gdf, sdf -> maximum(sdf, col))
minimum(gdf::GroupedDataFrame, col::ColumnIndex) = combine(gdf, sdf -> maximum(sdf, col))
I have added GroupedDataFrame
versions, but maybe they should be dropped and combine(gdf, sdf -> maximum(sdf, col))
should be required explicitly? (as GroupedDataFrame
is iterable)?
also probably adding a kwarg to allow deciding if a single or multiple rows are kept is worth adding (and also view
if in case of multiple rows a view or copy should be added).
It gets complicated. I mark it as 1.5 release to let us think about it without rushing 😄.
This would essentially be equivalent to slice_max
in dplyr.
A major difference between the standard maximum
method on collections and the one discussed here is that when there are ties the rows that match the maximum on the passed columns would not necessarily be equal on other columns. So it could be safer to return all tied rows by default (with a keyword argument to disable it) to ensure users take this into account.