Survey.jl
Survey.jl copied to clipboard
SE=0 shown as really small values `bydomain`
I noticed negligible standard errors (almost 0) in many domain estimation cases. like here
julia> mean(:api00, :cname, bclus1)
11×3 DataFrame
Row │ cname mean SE
│ String15 Float64 Float64
─────┼────────────────────────────────────
1 │ Santa Clara 732.077 58.2169
2 │ San Diego 659.436 2.66703
3 │ Merced 519.25 2.28936e-15
4 │ Los Angeles 647.267 47.6233
5 │ Orange 710.563 2.19826e-13
6 │ Fresno 472.0 1.13687e-13
7 │ Plumas 709.556 1.26058e-13
8 │ Alameda 669.0 1.27527e-13
9 │ San Joaquin 551.189 2.1791e-13
10 │ Kern 452.5 0.0
11 │ Mendocino 623.25 1.09545e-13
Are these just floating point 0's? should they be printed as "0.0" like for Kern
?
But im pretty sure these are not lonely PSU cases, there is more than 2 values to calc a variance?
@ayushpatnaikgit
This is not a floating-point error, but actual standard error of mean calculation. should we just approx the error to 0?
This is not a floating-point error, but actual standard error of mean calculation. should we just approx the error to 0?
this is negligible error. i think those domains have single value. calculation of variance from a single number has options- Nan, missing, or just 0.
@sayantikaSSG @itsdebartha can you check that the really small SE values are coming from domain with size=1. If so, then add condition in bydomain
to give NaN/NA value for these domains. When printing, these domains should also print NA.
@ayushpatnaikgit @codetalker7 Im assuming it is fine to have NA values in numeric vectors in Julia? There are no performance penalties or computation difficulties?