algebird
algebird copied to clipboard
Discrepancy in behavior of QTree[Long] and QTree[Double] for quantileBounds
Been poking arond with the Algebird 0.12.0 release (which included fix: https://github.com/twitter/algebird/pull/472/files). Noticed that the behavior for QTree[Long] and QTree[Double] is fairly different when you try to pick up the 0.5 quantile (median):
val longList = List(1L, 2L)
val qtreeSemigroup = new QTreeSemigroup[Long](6)
val qt = longList.map{ QTree(_) }.reduce{ qtreeSemigroup.plus(_, _) }
qt.quantileBounds(0.5)
res1: (Double, Double) = (2.0,3.0)
This seems incorrect. When I try the same as doubles:
val doubleList = List(1.0, 2.0)
val qtreeSemigroup = new QTreeSemigroup[Double](6)
val qt = doubleList.map{ QTree(_) }.reduce{ qtreeSemigroup.plus(_, _) }
qt.quantileBounds(0.5)
res1: (Double, Double) = (2.0,2.0000152587890625)
instead of 2.0 -> 3.0 the range is 2.0 -> 2.0 which seems more sane.
(See a similar issue when I have a list of 1L, 2L, 3L -> median is (2.0, 3.0) instead of (2.0, 2.0000152587890625) when the list is 1.0, 2.0, 3.0 ).
Wondering if the above behavior difference is expected? Or is this an issue we should probably explore and fix?
@avibryant - @johnynek mentioned that you might know more so tagging you on this.
Quick pointer: this is related to the level parameter in https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/QTree.scala#L45. For a Double
this defaults to -16
(which is to say, the smallest resolution is 2^-16). For a Long
this defaults to 0
.
This issue highlights that this is maybe a bad idea, or maybe there's something else in the semantics of these bounds that needs fixing.
Yep. If instead of QTree(_) you use:
QTree(131072,-16,1,_,None,None)
The results will be the same (the result you're seeing for Double right now).
Yeah that makes sense. I can keep reducing the resolution (from 2^-16 to 2^-1) and the range gets wider:
val qt = longList.map{ QTree(32,-4,1,_,None,None) }.reduce{ qtreeSemigroup.plus(_, _) }
scala> qt.quantileBounds(0.5)
res12: (Double, Double) = (2.0,2.0625)
...
scala> val qt = longList.map{ QTree(4,-1,1,_,None,None) }.reduce{ qtreeSemigroup.plus(_, _) }
qt: com.twitter.algebird.QTree[Long] = (4,-1,2,3,None,None)
scala> qt.quantileBounds(0.5)
res16: (Double, Double) = (2.0,2.5)
Some beginners bug these are. I do not understand this correction of instantiating Qtree like : QTree(2^16 *2,-16,1,_,None,None)
Maybe this is the syntax? somelist.map(x =>QTree(x*65536,-16,1,x,None,None)) for 16 digits precision