dataframe
dataframe copied to clipboard
☂️ Describe breaks on `Number` column (and other statistics inconsistencies)
This happens because the Iterable<Number>.std()
function accepts Number
but doesn't convert them to Double
(like mean()
does).
There are a couple more missing actually:
-
cumSum
- Misses
Byte
,Short
- Has
DataColumn
overloads but notIterable
/Sequence
- Misses
-
mean
- Has
Sequence<Double | Float>
but not for otherNumber
types
- Has
-
median
- Misses
Float
,Byte
,Short
,Number
(it only works onComparable
) - Needs to handle other types consistently
- No
Sequence
overloads - Cannot
skipNA
(if applicable)
- Misses
-
min
andmax
- internal
Iterable<T>.min
andmax
are not used and can be removed. Stdlib functions for Comparable sequences and iterables are used instead. - Misses
Number
(it only works onComparable
)
- internal
-
std
- Breaks if type is
Number
-
Short
andByte
are cast toInt
which works but is a bit iffy - Iterable overloads missing for
Number
,Short
,Byte
- Sequence overloads missing
- Nullable overloads missing for Iterable (and sequence)
- Breaks if type is
-
varianceAndMean
- also provides
std(ddof: Int)
function without docs of what ddof even means, as well ascount
. Could have a better name. Also can produce nulls?? this screams for documentation. - variance functions are missing on DataColumns entirely (had to be added separately for Kandy)
- Misses
Short
,Byte
,Number
, and nullable overloads - Misses Sequence overloads
- also provides
-
sum
- Has
TODO
s where types are amiss - Misses
Float
(!),Short
,Byte
,Number
in variousIterable
overloads.
- Has
All are also missing BigInteger
as we're supporting BigDecimal
too.
https://github.com/Kotlin/dataframe/issues/352 probably same problem
As mentioned here https://github.com/Kotlin/dataframe/issues/543, some functions like median(ints)
might result in an unexpectedly rounded Int
in return. It might be better to let all functions return Double
and then handle BigInteger
/ BigDecimal
separately for now, as they're java-specific for now.
It looks like an umbrella ticket and should be split to a smaller task