Inconsistency between `fmean` and `fsum`
Hello, firstly great package again as always! Just wanted to highlight a few inconsistencies and possibly an erroneous result.
I noticed that when calling fmean() on an integer vector with NA values and groups, the result doesn't return NA as one might expect. I suspect it has something to do with the integer overflow issue mentioned in the documentation, but I'd at least expect a similar erroneous result with fsum which I'm not seeing there.
I also noticed that the length of the result returned by fsum() doesn't match that of fmean() in the case of integer vectors.
In any case I've attached some simple examples to demonstrate the behaviour I'm seeing.
library(collapse)
#> collapse 2.0.16, see ?`collapse-package` or ?`collapse-documentation`
#>
#> Attaching package: 'collapse'
#> The following object is masked from 'package:stats':
#>
#> D
fsum(c(1L, NA), na.rm = F) #Ok
#> [1] NA
fmean(c(1L, NA), na.rm = F) #Ok
#> [1] NA
fsum(c(1L, NA), na.rm = F, g = c(1L, 1L)) #Ok
#> 1
#> NA
fmean(c(1L, NA), na.rm = F, g = c(1L, 1L)) #Not ok
#> 1
#> -1073741824
fsum(integer()) #Ok
#> integer(0)
fmean(integer()) #Not ok
#> [1] NA
fsum(numeric()) #Ok
#> numeric(0)
fmean(numeric()) #Ok
#> numeric(0)
fsum(integer(), na.rm = F, g = integer()) #Ok
#> named integer(0)
fmean(integer(), na.rm = F, g = integer()) #Not ok
#> <NA>
#> NA
fsum(numeric(), na.rm = F, g = integer()) #Ok
#> named numeric(0)
fmean(numeric(), na.rm = F, g = integer()) #Ok
#> named numeric(0)
Created on 2024-08-26 with reprex v2.1.0
Thanks! So actually all more involved numerical statistics (fmean, fmedian, fvar, fsd, fquantile) give NA when passed integer(), which is consistent with base R. What is inconsistent is that fsum(),fmin, and fmax() return integer(), instead of 0, Inf, -Inf. I can think about this for a while. I guess I didn't go with base R here because I did not like these defaults.
And yeah, fmean() for integers with groups currently does not do any checks if na.rm = FALSE. which gives a negative number here because NA_integer_ is the smallest integer in C. I'll see if I can fix that one.
I was wondering if a decision regarding the defaults had been reached? It's causing issues with one of my packages that expects a single row per group.
I personally think returning numeric() for 0-length inputs is the more logical and easy to program with regardless of whether groups are supplied.
This is of course a personal preference but whatever you decide to implement should be consistent across various inputs and scenarious.
Just to note also that both fmean and fmedian give numeric() when passed numeric() but return NA when passed integer().
Thanks for the reminder. I plan an update on the weekend and will look further into it then.