StatsBase.jl icon indicating copy to clipboard operation
StatsBase.jl copied to clipboard

Accept reinterpreted arrays for weighted covariance calculation

Open tscode opened this issue 2 years ago • 1 comments

I like to keep observed variables in structs and use reinterpret to efficiently convert a vector of variables to a matrix. However, this clashes with cov when weights are given. Consider this example (I am on julia 1.9.0):

using StatsBase

struct Observation
  a :: Float64
  b :: Float64
end

obs = [Observation(1., 2.), Observation(2., 4.)]
weights = [0.5, 0.5]

data = reinterpret(reshape, Float64, obs) # get a 'reinterpreted' Float64 matrix
cov(data, dims = 2, corrected = false) # works as expected
cov(data, Weights(weights), 2) # fails

data = unsafe_wrap(Array, pointer(data), size(data))
cov(data, Weights(weights), 2) # works as expected

The issue is that reinterpret(reshape, Float64, obs) does not yield a subtype of DenseMatrix, even though it is a dense matrix. I am not sure if this issue should be resolved by StatsBase or can be tackled upstream.

tscode avatar Jul 13 '23 15:07 tscode

I am not sure if this issue should be resolved by StatsBase or can be tackled upstream.

The relevant method is defined by StatsBase, so this is the right place. Thanks for the report!

The issue is that reinterpret(reshape, Float64, obs) does not yield a subtype of DenseMatrix

Based on a brief look at the implementation, I don't see a reason why the methods need to be restricted to DenseMatrix; it seems to me that any AbstractMatrix could work, including your example. We could try simply loosening the argument's type restriction for the relevant method definitions.

ararslan avatar Jul 13 '23 18:07 ararslan