SnpArrays.jl icon indicating copy to clipboard operation
SnpArrays.jl copied to clipboard

Best way to perform linear algebra on a subset of SnpArray v0.7?

Open biona001 opened this issue 6 years ago • 1 comments

For my application I often need to perform linear algebra on subsets of a SnpArray. What is the best way to do this?

My main problem is a SnpBitMatrix cannot be create on x_subset1 or x_subset2:

using LinearAlgebra, SnpArrays
x = SnpArray(undef, 10000, 10000)
x_subset1 = x[1:1000, 1:1000]
x_subset2 = @view x[1:1000, 1:1000]
julia> x_subsetbm1 = SnpBitMatrix{Float64}(x_subset1, center=true, scale=true);

ERROR: MethodError: no method matching SnpBitMatrix{Float64}(::Array{UInt8,2}; center=true, scale=true)
Closest candidates are:
  SnpBitMatrix{Float64}(::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any) where T at /Users/biona001/.julia/packages/SnpArrays/pfwqg/src/linalg.jl:2 got unsupported keyword arguments "center", "scale"
  SnpBitMatrix{Float64}(::Any) where T<:AbstractArray at abstractarray.jl:22 got unsupported keyword arguments "center", "scale"
  SnpBitMatrix{Float64}(::SnpArray; model, center, scale) where T<:AbstractFloat at /Users/biona001/.julia/packages/SnpArrays/pfwqg/src/linalg.jl:19
Stacktrace:
 [1] top-level scope at none:0
julia> x_subsetbm2 = SnpBitMatrix{Float64}(x_subset2, center=true, scale=true);

ERROR: MethodError: no method matching SnpBitMatrix{Float64}(::SubArray{UInt8,2,SnpArray,Tuple{UnitRange{Int64},UnitRange{Int64}},false}; center=true, scale=true)
Closest candidates are:
  SnpBitMatrix{Float64}(::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any) where T at /Users/biona001/.julia/packages/SnpArrays/pfwqg/src/linalg.jl:2 got unsupported keyword arguments "center", "scale"
  SnpBitMatrix{Float64}(::Any) where T<:AbstractArray at abstractarray.jl:22 got unsupported keyword arguments "center", "scale"
  SnpBitMatrix{Float64}(::SnpArray; model, center, scale) where T<:AbstractFloat at /Users/biona001/.julia/packages/SnpArrays/pfwqg/src/linalg.jl:19
Stacktrace:
 [1] top-level scope at none:0

I could instantiate a smaller SnpArray and copy desired elements into my new SnpArray, instantiate a SnpBitMatrix from this new SnpArray and use that to calculate things, but I feel like this is not a good solution:

x_subset3 = SnpArray(undef, 1000, 1000)
copyto!(x_subset3, @view x[1:1000, 1:1000])
xbm_subset = SnpBitMatrix{Float64}(x_subset3, model=ADDITIVE_MODEL, center=true, scale=true);

biona001 avatar Jan 16 '19 04:01 biona001

The issue is that indexing on SnpArray does not return SubArray{SnpArray}, but SubArray{UInt8,2}. I can think of two options to resolve this: change indexing on SnpArray to return a SubArray{SnpArray}, or make a SnpBitMatrix constructor that takes SubArray{UInt8, 2} as an argument. I think the latter would be more efficient.

kose-y avatar Jan 16 '19 14:01 kose-y