LoopVectorization.jl
LoopVectorization.jl copied to clipboard
vmap and StaticVectors
As predicted, vmap
gives me a tremendous boost in performance =) I'm hitting an error when mapping over static arrays though
julia> vmap(exp, randn(8));
julia> vmap(exp, @SVector randn(8));
ERROR: conversion to pointer not defined for SArray{Tuple{8},Float64,1,8}
Currently, the @avx
macro won't work either.
vmap
and @avx
call VectorizationBase.vectorizable
and VectorizationBase.stridedpointer
on each of the arrays, respectively.
These return structs holding the pointer (and in the case of stridedpointer
, also the array's strides so that we can use CartesianIndexing).
I've not updated PaddedMatrices for the new VectorizationBase and LoopVectorization yet (I'll register it once I do), but in the link you can see the workaround I used for the static array type that library defines.
The problem, as the error says, is that we can't get pointers to struct
s.
If at all possible, pointers are preferable to that workaround I linked. Instead of defining a vector loads using LLVM intrinsics, it just uses a bunch of @inbounds
getindexes and wishes the compiler luck.
When the loads are masked, the compiler almost always generates suboptimal code.
More importantly, we need a pointer to be able to store.
Possible solutions that come to mind:
- One of the libraries adds the other as a dependency to define a
SArray
specific overload of some base function, like in the linked example. This will be required by every library implementing an Array type without implementingpointer
. - I make a method like the above the default, specifically overloading
Array
and other types that define pointers to have the current behavior. This requires adding dependencies for every library implementing their own mutable array type. - Combine
1.
and2.
usingismutable(x) = typeof(x).mutable
to choose between default methods. This will reduce the number of needed overloads by a little. Notably, this fails for anystruct
wrapping a mutable array, soLinearAlgebra.Adjoint
,Base.SubArray
, etc will all still need to be special-cased. - Dispatch on StridedArray to make the default decision, because the
StridedArrays
interface requiresBase.unsafe_convert(::Type{Ptr{T}}, A)
. - PR to Julia to add a query about whether pointers are defined to the AbstractArray interface.
I like 4.
We are making memory layout assumptions via using raw pointers. Someone's AbstractArray type following those assumptions hopefully subtypes DenseArray
. Alternatively, option 1.
of providing a specific overload is still open to them.
Currently MArray
is not a StridedArray (nor is LinearAlgebra.Adjoint{T,A} where {T,A<:StridedArray{T}}
, but I can provide special methods for that).
To support writing to MArray
s, StaticArrays.jl
would then need to either make MArray
a subtype of DenseArray
, or follow approach 1.
.
EDIT:
https://github.com/JuliaArrays/StaticArrays.jl/blob/master/src/MArray.jl#L20
MArray
is already a subtype of StaticArray
. This means to make it a subtype of DenseArray
, while SArray
isn't, they'd have to make StaticArray
a union, which then would make it so that other libraries can't define their own types to be a subtype of StaticArray
-- a big drawback the authors may be unenthusiastic about.
It is worth watching ArrayInterface.jl. That may be very useful for supporting StaticVectors, as well as other array types like struct of arrays and arrays of structs.