Statistics.jl icon indicating copy to clipboard operation
Statistics.jl copied to clipboard

median should take by/lt arguments similar to sort

Open stevengj opened this issue 8 years ago • 12 comments

I just found myself needing do take the median of some function of an array, and found that unfortunately the median function does not take a by keyword analogous to sort. This would be nice to have.

stevengj avatar Nov 02 '16 12:11 stevengj

Might make sense to have this for quantile too.

simonster avatar Nov 02 '16 13:11 simonster

Anything order-based, in fact. We should make sure that all order-related functions have this.

StefanKarpinski avatar Nov 02 '16 13:11 StefanKarpinski

Couldn't generators replace this feature in a consistent fashion for all functions?

nalimilan avatar Nov 02 '16 14:11 nalimilan

I thought we were just going to pass an ordering function? For type stability that can't be a keyword, though.

timholy avatar Nov 02 '16 14:11 timholy

@nalimilan, generators can't be used for an in-place function like median!.

stevengj avatar Nov 02 '16 14:11 stevengj

If x is a collection and foo is a function then we can sort with sort!(x,by=foo). For simplicity let n=length(x) be odd. My candidate for the median is x[n>>1 + 1] (and not foo(x[n>>1 + 1])). I don't see a type stability issue with the former.

I'm wondering if we should change the calculation of middle for non-Number types such that median(x) in x since it might be possible to sort things for which you cannot compute averages, e.g. ['a','b','c','d']. This is even more important with a by keyword. Technically, both 'b' and 'c' are medians so we'd need to figure out which of them we prefer.

Updated to make sense.

andreasnoack avatar Nov 02 '16 14:11 andreasnoack

I guess if you're returning the un-fooed value (which, duh, is what we do), then you're right that type stability isn't crucial. (You'll feel the lack for small inputs, though.)

timholy avatar Nov 02 '16 14:11 timholy

Regarding type stability problems with the keywords - is it better to propagate the by keyword or the By type (and friends) to other functions throughout Base? Will keyword argument type stability be fixed by 0.6? I kind-of liked By... but I see it might just be a workaround.

andyferris avatar Nov 02 '16 23:11 andyferris

@ALL @timholy @stevengj @StefanKarpinski @andreasnoack Is this issue still open? Can I work on this?

christianbender avatar Jul 05 '18 18:07 christianbender

Please don't ping excessively.

Anyone can work on any issue.

KristofferC avatar Jul 05 '18 18:07 KristofferC

Seems reasonable to me to ask if one is new to the project.

@christianbender You may find it easier to ask such questions on the project slack and even discuss as you work on a solution.

ViralBShah avatar Jul 05 '18 18:07 ViralBShah

@KristofferC sorry for the annoy @ViralBShah

Thanks for help

christianbender avatar Jul 05 '18 19:07 christianbender