Spark.TableStatsExample issues

Handle the insert of the maxSize(th) element for the first time and s…

…ubsequent updates When you insert the maxSize(th) value for the first time, update the lowest count and add the element as well. When modifying the TopNList, just perform inplace updates...

carlnayak

Fixed a bug in the minLong calculation

1

The min long calculation originally would take the max of the min values instead of the min

carlnayak

Fix for the TopNList add method

1

The original would only add the first N key-values encountered

finleysg

sumLong bug in ColumnStats.scala and TestTableStatsSinglePathMain.scala

Thanks for sharing, this performs significantly better than what I was using! While validating the getFirstPassStat statistics on our data I discovered a sumLong bug in ColumnStats.scala [Part B.1.1](http://blog.cloudera.com/blog/2015/07/how-to-do-data-quality-checks-using-apache-spark-dataframes/comment-page-1/#comment-74803). [ColumnStats.scala](https://github.com/tmalaska/Spark.TableStatsExample/blob/master/src/main/scala/com/cloudera/sa/examples/tablestats/model/ColumnStats.scala)...

BrentDorsey

Fixes

nothing major here. just some suggestions, you don't need to like them all :). I broke it up into a few commits, might be easier to look at one commit...

squito

Spark.TableStatsExample
Spark.TableStatsExample copied to clipboard

Metadata

Handle the insert of the maxSize(th) element for the first time and s…

Fixed a bug in the minLong calculation

Fix for the TopNList add method

sumLong bug in ColumnStats.scala and TestTableStatsSinglePathMain.scala

Fixes

← Metadata

Owner

Metadata

Spark.TableStatsExample Spark.TableStatsExample copied to clipboard

Metadata

Handle the insert of the maxSize(th) element for the first time and s…

Fixed a bug in the minLong calculation

Fix for the TopNList add method

sumLong bug in ColumnStats.scala and TestTableStatsSinglePathMain.scala

Fixes

← Metadata

Owner

Metadata

Spark.TableStatsExample
Spark.TableStatsExample copied to clipboard