algebird icon indicating copy to clipboard operation
algebird copied to clipboard

Discrepancy in behavior of QTree[Long] and QTree[Double] for quantileBounds

Open piyushnarang opened this issue 8 years ago • 5 comments

Been poking arond with the Algebird 0.12.0 release (which included fix: https://github.com/twitter/algebird/pull/472/files). Noticed that the behavior for QTree[Long] and QTree[Double] is fairly different when you try to pick up the 0.5 quantile (median):

val longList = List(1L, 2L)
val qtreeSemigroup = new QTreeSemigroup[Long](6)
val qt = longList.map{ QTree(_) }.reduce{ qtreeSemigroup.plus(_, _) }
qt.quantileBounds(0.5)

res1: (Double, Double) = (2.0,3.0)

This seems incorrect. When I try the same as doubles:

val doubleList = List(1.0, 2.0)
val qtreeSemigroup = new QTreeSemigroup[Double](6)
val qt = doubleList.map{ QTree(_) }.reduce{ qtreeSemigroup.plus(_, _) }
qt.quantileBounds(0.5)

res1: (Double, Double) = (2.0,2.0000152587890625)

instead of 2.0 -> 3.0 the range is 2.0 -> 2.0 which seems more sane.

(See a similar issue when I have a list of 1L, 2L, 3L -> median is (2.0, 3.0) instead of (2.0, 2.0000152587890625) when the list is 1.0, 2.0, 3.0 ).

Wondering if the above behavior difference is expected? Or is this an issue we should probably explore and fix?

piyushnarang avatar Mar 18 '16 21:03 piyushnarang

@avibryant - @johnynek mentioned that you might know more so tagging you on this.

piyushnarang avatar Mar 18 '16 21:03 piyushnarang

Quick pointer: this is related to the level parameter in https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/QTree.scala#L45. For a Double this defaults to -16 (which is to say, the smallest resolution is 2^-16). For a Long this defaults to 0.

This issue highlights that this is maybe a bad idea, or maybe there's something else in the semantics of these bounds that needs fixing.

avibryant avatar Mar 18 '16 21:03 avibryant

Yep. If instead of QTree(_) you use:

QTree(131072,-16,1,_,None,None)

The results will be the same (the result you're seeing for Double right now).

jnievelt avatar Mar 18 '16 21:03 jnievelt

Yeah that makes sense. I can keep reducing the resolution (from 2^-16 to 2^-1) and the range gets wider:

val qt = longList.map{ QTree(32,-4,1,_,None,None) }.reduce{ qtreeSemigroup.plus(_, _) }
scala> qt.quantileBounds(0.5)
res12: (Double, Double) = (2.0,2.0625)
...
scala> val qt = longList.map{ QTree(4,-1,1,_,None,None) }.reduce{ qtreeSemigroup.plus(_, _) }
qt: com.twitter.algebird.QTree[Long] = (4,-1,2,3,None,None)

scala> qt.quantileBounds(0.5)
res16: (Double, Double) = (2.0,2.5)

piyushnarang avatar Mar 18 '16 21:03 piyushnarang

Some beginners bug these are. I do not understand this correction of instantiating Qtree like : QTree(2^16 *2,-16,1,_,None,None)

Maybe this is the syntax? somelist.map(x =>QTree(x*65536,-16,1,x,None,None)) for 16 digits precision

nuria avatar Feb 08 '18 21:02 nuria