LoopVectorization.jl
LoopVectorization.jl copied to clipboard
vmap and StaticVectors
As predicted, vmap gives me a tremendous boost in performance =) I'm hitting an error when mapping over static arrays though
julia> vmap(exp, randn(8));
julia> vmap(exp, @SVector randn(8));
ERROR: conversion to pointer not defined for SArray{Tuple{8},Float64,1,8}
Currently, the @avx macro won't work either.
vmap and @avx call VectorizationBase.vectorizable and VectorizationBase.stridedpointer on each of the arrays, respectively.
These return structs holding the pointer (and in the case of stridedpointer, also the array's strides so that we can use CartesianIndexing).
I've not updated PaddedMatrices for the new VectorizationBase and LoopVectorization yet (I'll register it once I do), but in the link you can see the workaround I used for the static array type that library defines.
The problem, as the error says, is that we can't get pointers to structs.
If at all possible, pointers are preferable to that workaround I linked. Instead of defining a vector loads using LLVM intrinsics, it just uses a bunch of @inbounds getindexes and wishes the compiler luck.
When the loads are masked, the compiler almost always generates suboptimal code.
More importantly, we need a pointer to be able to store.
Possible solutions that come to mind:
- One of the libraries adds the other as a dependency to define a
SArrayspecific overload of some base function, like in the linked example. This will be required by every library implementing an Array type without implementingpointer. - I make a method like the above the default, specifically overloading
Arrayand other types that define pointers to have the current behavior. This requires adding dependencies for every library implementing their own mutable array type. - Combine
1.and2.usingismutable(x) = typeof(x).mutableto choose between default methods. This will reduce the number of needed overloads by a little. Notably, this fails for anystructwrapping a mutable array, soLinearAlgebra.Adjoint,Base.SubArray, etc will all still need to be special-cased. - Dispatch on StridedArray to make the default decision, because the
StridedArraysinterface requiresBase.unsafe_convert(::Type{Ptr{T}}, A). - PR to Julia to add a query about whether pointers are defined to the AbstractArray interface.
I like 4. We are making memory layout assumptions via using raw pointers. Someone's AbstractArray type following those assumptions hopefully subtypes DenseArray. Alternatively, option 1. of providing a specific overload is still open to them.
Currently MArray is not a StridedArray (nor is LinearAlgebra.Adjoint{T,A} where {T,A<:StridedArray{T}}, but I can provide special methods for that).
To support writing to MArrays, StaticArrays.jl would then need to either make MArray a subtype of DenseArray, or follow approach 1..
EDIT:
https://github.com/JuliaArrays/StaticArrays.jl/blob/master/src/MArray.jl#L20
MArray is already a subtype of StaticArray. This means to make it a subtype of DenseArray, while SArray isn't, they'd have to make StaticArray a union, which then would make it so that other libraries can't define their own types to be a subtype of StaticArray -- a big drawback the authors may be unenthusiastic about.
It is worth watching ArrayInterface.jl. That may be very useful for supporting StaticVectors, as well as other array types like struct of arrays and arrays of structs.