LoopVectorization.jl vmap and StaticVectors

vmap and StaticVectors

Open baggepinnen opened this issue 4 years ago • 2 comments

As predicted, vmap gives me a tremendous boost in performance =) I'm hitting an error when mapping over static arrays though

julia> vmap(exp, randn(8));

julia> vmap(exp, @SVector randn(8));
ERROR: conversion to pointer not defined for SArray{Tuple{8},Float64,1,8}

Jan 02 '20 08:01 baggepinnen

Currently, the @avx macro won't work either. vmap and @avx call VectorizationBase.vectorizable and VectorizationBase.stridedpointer on each of the arrays, respectively. These return structs holding the pointer (and in the case of stridedpointer, also the array's strides so that we can use CartesianIndexing).

I've not updated PaddedMatrices for the new VectorizationBase and LoopVectorization yet (I'll register it once I do), but in the link you can see the workaround I used for the static array type that library defines. The problem, as the error says, is that we can't get pointers to structs.

If at all possible, pointers are preferable to that workaround I linked. Instead of defining a vector loads using LLVM intrinsics, it just uses a bunch of @inbounds getindexes and wishes the compiler luck. When the loads are masked, the compiler almost always generates suboptimal code. More importantly, we need a pointer to be able to store.

Possible solutions that come to mind:

One of the libraries adds the other as a dependency to define a SArray specific overload of some base function, like in the linked example. This will be required by every library implementing an Array type without implementing pointer.
I make a method like the above the default, specifically overloading Array and other types that define pointers to have the current behavior. This requires adding dependencies for every library implementing their own mutable array type.
Combine 1. and 2. using ismutable(x) = typeof(x).mutable to choose between default methods. This will reduce the number of needed overloads by a little. Notably, this fails for any struct wrapping a mutable array, so LinearAlgebra.Adjoint, Base.SubArray, etc will all still need to be special-cased.
Dispatch on StridedArray to make the default decision, because the StridedArrays interface requires Base.unsafe_convert(::Type{Ptr{T}}, A).
PR to Julia to add a query about whether pointers are defined to the AbstractArray interface.

I like 4. We are making memory layout assumptions via using raw pointers. Someone's AbstractArray type following those assumptions hopefully subtypes DenseArray. Alternatively, option 1. of providing a specific overload is still open to them.

Currently MArray is not a StridedArray (nor is LinearAlgebra.Adjoint{T,A} where {T,A<:StridedArray{T}}, but I can provide special methods for that). To support writing to MArrays, StaticArrays.jl would then need to either make MArray a subtype of DenseArray, or follow approach 1..

EDIT: https://github.com/JuliaArrays/StaticArrays.jl/blob/master/src/MArray.jl#L20 MArray is already a subtype of StaticArray. This means to make it a subtype of DenseArray, while SArray isn't, they'd have to make StaticArray a union, which then would make it so that other libraries can't define their own types to be a subtype of StaticArray -- a big drawback the authors may be unenthusiastic about.

Jan 02 '20 14:01 chriselrod

It is worth watching ArrayInterface.jl. That may be very useful for supporting StaticVectors, as well as other array types like struct of arrays and arrays of structs.

Feb 11 '20 15:02 chriselrod

LoopVectorization.jl LoopVectorization.jl copied to clipboard

vmap and StaticVectors

LoopVectorization.jl
LoopVectorization.jl copied to clipboard