Mistake in userguide on reducing / binning
Hi again, a second issue I ran into is related to the userguide: The example for Grouping on calculated columns regarding binning doesn't compile in v0.37.3 and apart from that doesn't lead to a reasonable result, as far as I see.
-
Compilation isn't possible, as bin() returns a DoubleColumn (even if called on a IntColumn), and
summarize(.).by(.)demands a CategoricalColumn, which DoubleColumn is not. -
The default implementation of bin(.) in
NumberMapFunctionsreturns the number of occurrences for each bin - but does not map this to the rows. To avoid an Exception at runtime about different number of rows, you would need to give the bin(.)-method the number of rows as parameter,, which leads to an enormous amount of bins. On top of that, it is possible, to have more than one bin having the same amount of rows mapped into, so it's not a reasonable mapping.
(I achieved this functionality of binning by dividing, rounding and multiplying a column whith conversion to int to use it as categorical.)
Apart from this as a side note: I know it's hard and most of the time the least relevant (but most time consuming!) in a developer's perspective - but from a user's perspective, a little bit more documentation (javadoc) would be very handsome in many cases... ;-)
Thanks again. :-)
Yeah, the docs could use some work. It was hard to make sure the code in the docs compiled, so we wrote something that will allow you to embed code snippets that get compiled to ensure they work. For example:
https://github.com/jtablesaw/tablesaw/blob/master/docs-src/main/gettingstarted.md https://github.com/jtablesaw/tablesaw/blob/master/docs-src/src/main/java/tech/tablesaw/docs/GettingStarted.java
We've only converted some of the docs so far though, so there's still probably a number of broken examples. We'd appreciate any help you want to lend on fixing up the docs.