SnpArrays.jl
SnpArrays.jl copied to clipboard
LoopVectorization fails on some operations?
Reproducing the issue:
using SnpArrays
mouse = SnpArray(SnpArrays.datadir("mouse.bed"))
A = SnpLinAlg{Float64}(mouse)
A*A'
This results in the following message:
┌ Warning: #= /home/alanderos/.julia/packages/SnpArrays/WBzpL/src/linalg_direct.jl:760 =#:
│ `LoopVectorization.check_args` on your inputs failed; running fallback `@inbounds @fastmath` loop instead.
│ Use `warn_check_args=false`, e.g. `@turbo warn_check_args=false ...`, to disable this warning.
└ @ SnpArrays ~/.julia/packages/LoopVectorization/FMfT8/src/condense_loopset.jl:1049
This happens when I start Julia with 10 threads on a 10 core machine. Here's some additional information:
Julia Version 1.7.3
Commit 742b9abb4d (2022-05-06 12:58 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i9-10900KF CPU @ 3.70GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Note: The issue came up trying to compute A*B
where A
is SnpLinAlg
and B
is SparseMatrixCSC
. So other operations seem to run into the issue.
Sorry, probably my fault. At least the answer seems correct.
For matrix X
, I only tested A*X
where A
is SnpLinAlg and X
is dense matrix. Not sure why it fails when X
is another SnpLinAlg or SparseMatrixCSC, would need some time to figure this out
I don't think it's anybody's fault, Ben. It's just that we have not designed a kernel for this specific operation. LoopVectorization
is not really a good solution for SparseMatrixCSC. For SnpLinAlg x SnpLinAlg, the quick and dirty workaround is indeed transforming one of them into a numeric matrix. If it is large enough, I think it might still be faster than numeric matrix x numeric matrix due to having slightly better cache efficiency.
Thank you both for the quick reply. I only used SnpLinAlg
x SnpLinAlg
as an example since it throws the same warning and is incredibly slow, for me at least.
In the short-term, I think it would be reasonable to restrict this and this to DenseMatrix{T}
instead of AbstractMatrix{T}
. Multiplication with a generic AbstractMatrix{T}
should either throw an error, as it is technically not supported, or give a warning, since the current kernel is not suitable for that case.
In the long-term it would be nice to round out support for matrix-matrix multiplication. I can probably implement something for SnpLinAlg
times SparseMatrixCSC
for my problem and contribute it as a starting point. Can't promise it will be the best, though.