Robert (Bobby) Evans

Results 186 comments of Robert (Bobby) Evans

I still want this it would be a good performance boost for anyone in SQL doing a first or a last like operation.

> The histogram is small enough it probably makes sense to just copy it to the CPU and do this step serially. That is true if we are doing a...

From what I saw in the code Spark is using a long as the word size, but appears to have 6 bits per histogram bucket. https://github.com/apache/spark/blob/4835946de2ef71b176da5106e9b6c2706e182722/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/HyperLogLogPlusPlusHelper.scala#L271 https://github.com/apache/spark/blob/4835946de2ef71b176da5106e9b6c2706e182722/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/HyperLogLogPlusPlusHelper.scala#L78-L85 I think what...

That sounds great. > Open question if this ctor should enforce each list being the same size (requires a kernel) I thought in general if it made the code slower...

Yes we are all having the same issue. There appears to be something odd happening with scala 2.11. If you upgarade all of the versions in the pom.xml, not just...

Not yet. We ran the multi-node setup manually, and added in the single node script after the fact to capture what we had done, and make it simpler to try...