stream-lib icon indicating copy to clipboard operation
stream-lib copied to clipboard

Possibly out of range

Open ahmadpriatama opened this issue 9 years ago • 1 comments

I'm using com.clearspring.analytics.stream.quantile.QDigest class to approximate 100k datum, which is possibly summing this will result higher than int64 range. Found this when running on amazon EMR

Caused by: java.lang.IllegalArgumentException: Can only accept values in the range 0..4611686018427387903, got 9223372036854775807
    at com.clearspring.analytics.stream.quantile.QDigest.offer(QDigest.java:125)
    at com.liveramp.cascading_ext.combiner.lib.QuantileExactAggregator.partialAggregate(QuantileExactAggregator.java:38)
    at com.liveramp.cascading_ext.combiner.lib.QuantileExactAggregator.partialAggregate(QuantileExactAggregator.java:17)
    at com.liveramp.cascading_ext.combiner.CombinerFunctionContext.combineAndEvict(CombinerFunctionContext.java:130)
    at com.liveramp.cascading_ext.combiner.CombinerFunction.operate(CombinerFunction.java:130)
    at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:99)
    ... 11 more

i suppose because offer method parameter defined as long, is there any work around for this?

ahmadpriatama avatar Apr 16 '15 02:04 ahmadpriatama

Q-digest will cost more on every access if it uses long internally.

Having high resolution inputs is something that t-digest specifically excels at. I think some version of t-digest is included in streamlib. Recent versions are very fast and beat Q-digest accuracy dramatically, especially for high resolution inputs, for dramatic skew and for tail quantiles (which is what almost everybody wants).

On Thu, Apr 16, 2015 at 4:35 AM, Ahmad Priatama [email protected] wrote:

I'm using com.clearspring.analytics.stream.quantile.QDigest class to approximate 100k datum, which is possibly summing this will result higher than int64 range. Found this when running on amazon EMR

Caused by: java.lang.IllegalArgumentException: Can only accept values in the range 0..4611686018427387903, got 9223372036854775807 at com.clearspring.analytics.stream.quantile.QDigest.offer(QDigest.java:125) at com.liveramp.cascading_ext.combiner.lib.QuantileExactAggregator.partialAggregate(QuantileExactAggregator.java:38) at com.liveramp.cascading_ext.combiner.lib.QuantileExactAggregator.partialAggregate(QuantileExactAggregator.java:17) at com.liveramp.cascading_ext.combiner.CombinerFunctionContext.combineAndEvict(CombinerFunctionContext.java:130) at com.liveramp.cascading_ext.combiner.CombinerFunction.operate(CombinerFunction.java:130) at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:99) ... 11 more

i suppose because offer method parameter defined as long, is there any work around for this?

— Reply to this email directly or view it on GitHub https://github.com/addthis/stream-lib/issues/90.

tdunning avatar Apr 16 '15 06:04 tdunning