LoopVectorization.jl icon indicating copy to clipboard operation
LoopVectorization.jl copied to clipboard

Failing gracefully

Open stev47 opened this issue 4 years ago • 32 comments

It might be worth thinking about falling back to traditional loops if there are unrecognized expressions while issuing a warning about not optimizing instead of a hard error.

Feel free to close if this doesn't align with your goals: errors are fine too if the expectance is that @avx should fail if unable to optimize.

stev47 avatar Apr 27 '20 20:04 stev47

Hi, thanks for the issue.

The two times problems can be noticed are:

  1. Macro-expand time.
  2. When compiling the generated function.

Macro-expansion happens immediately upon defining a function. It'll create a LoopSet object that describes the loop, and then summarizes it, creating a call to the generated function _avx_! (which is passed the summary, and all necessary variables, as arguments).

I think I'd prefer errors immediately at macro-expansion time, if possible. If I wanted fall-back behavior, I could just delete the @avx. You could use @test_broken in your test suite to monitor and find out if/when it starts to work (in which case you would replace @test_broken with @test).

When compiling the generated function, I would like to at least have type-based fall-back behavior. This could allow either using a black or whitelist to make it easier to write generic code. If there are types involved that it can make fast, it will. Otherwise, they'll run as written.

chriselrod avatar Apr 28 '20 17:04 chriselrod

I've been enjoying some awesome 2x performance improvements from sticking an @avx in front of some lines of code. Unfortunately, this appears to have lost me the ability to pass in Double64 to my function, or differentiate it with ForwardDiff, which is also need to be able to do. Would there be a way for me to keep the @avx and have it fall back to standard code if the types involved are "exotic"? I could potentially maintain two different functions, that that would lead to a ton of code duplication.

When compiling the generated function, I would like to at least have type-based fall-back behavior. This could allow either using a black or whitelist to make it easier to write generic code. If there are types involved that it can make fast, it will. Otherwise, they'll run as written.

This sounds like it would be exactly what I need!

baggepinnen avatar Apr 29 '20 03:04 baggepinnen

The major obstacle to this is that the original loop expression is lost, and the LoopSet object doesn't preserve the original structure. It represents loops as a set of operations that depend on one another, external arrays, and specific loop indices.

Perhaps this is another argument for https://github.com/chriselrod/LoopVectorization.jl/issues/74

As a stop gap, we could revive a modification of this code as a stop-gap. @_avx_! could then be given the original loop expression, and check to confirm that the element type of each AbstractArray is in

julia> Union{Bool, Base.HWReal}
Union{Bool, Float32, Float64, Int16, Int32, Int64, Int8, UInt16, UInt32, UInt64, UInt8}

If not, it'll return the original quote.

PRs welcome if anyone wants to try their hand at implementing this. ;)

BTW, you can see some examples of using it with ForwardDiff here: https://github.com/chriselrod/LoopVectorization.jl/issues/93

chriselrod avatar Apr 29 '20 04:04 chriselrod

I'm not sure if I understand what the @_avx macro is doing. Is it inserting some info about "I want to @avx this" rather than avx:ing it immediately?

I was contemplating if a macro could be put at the entire function definition to from the following

function f(a::T, b::Vector{T}) where T

generate two methods, the one above, and additionally this one

function f(a::T, b::Vector{T}) where T <: Base.HWReal

where only the last one would contain the @avx features and the other one would be completely free of @avx. That would make compilation time faster for the @avx-free method I guess, but I'm not sure how to make the macro on the function definition "see" the code before it's transformed by the inner @avx annotations.

baggepinnen avatar Apr 29 '20 05:04 baggepinnen

The file I linked no longer exists, and is no longer how things work.

The macro now:

julia> @macroexpand @avx for i ∈ eachindex(a,b)
               s += a[i]*b[i]
           end
quote
    var"##loopi#1406" = LoopVectorization.maybestaticrange(eachindex(a, b))
    var"##i_loop_lower_bound#1407" = LoopVectorization.maybestaticfirst(var"##loopi#1406")
    var"##i_loop_upper_bound#1408" = LoopVectorization.maybestaticlast(var"##loopi#1406")
    var"##vptr##_a" = LoopVectorization.stridedpointer(a)
    var"##vptr##_b" = LoopVectorization.stridedpointer(b)
    local var"##s_0"
    begin
        $(Expr(:gc_preserve, :(var"##s_0" = LoopVectorization._avx_!(Val{(0, 0)}(), Tuple{:LoopVectorization, :getindex, LoopVectorization.OperationStruct(0x0000000000000001, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000, LoopVectorization.memload, 0x01, 0x01), :LoopVectorization, :getindex, LoopVectorization.OperationStruct(0x0000000000000001, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000, LoopVectorization.memload, 0x02, 0x02), :LoopVectorization, :LOOPCONSTANTINSTRUCTION, LoopVectorization.OperationStruct(0x0000000000000000, 0x0000000000000000, 0x0000000000000001, 0x0000000000000000, LoopVectorization.constant, 0x00, 0x03), :LoopVectorization, :vfmadd_fast, LoopVectorization.OperationStruct(0x0000000000000001, 0x0000000000000001, 0x0000000000000000, 0x0000000000010203, LoopVectorization.compute, 0x00, 0x03)}, Tuple{LoopVectorization.ArrayRefStruct{:a,Symbol("##vptr##_a")}(0x0000000000000001, 0x0000000000000001, 0x0000000000000000), LoopVectorization.ArrayRefStruct{:b,Symbol("##vptr##_b")}(0x0000000000000001, 0x0000000000000001, 0x0000000000000000)}, Tuple{0, Tuple{4}, Tuple{3}, Tuple{}, Tuple{}, Tuple{}, Tuple{}}, Tuple{:i}, (var"##i_loop_lower_bound#1407":var"##i_loop_upper_bound#1408",), var"##vptr##_a", var"##vptr##_b", s)), :a, :b))
    end
    s = LoopVectorization.reduced_add(var"##s_0", s)
end

It generates a call to _avx_!, along with a summary of what _avx_! needs to know about the original loop. This is how it uses compile-time information (instead of just macro-expansion-time info) to optimize the loops.

but I'm not sure how to make the macro on the function definition "see" the code before it's transformed by the inner @avx annotations.

Outer macros expand first, so this shouldn't be a problem.

julia> a = rand(43); b = rand(43);

julia> s = 0.0;

julia> macro showex(x)
          esc(@show x)
       end
@showex (macro with 1 method)

julia> @showex @avx for i ∈ eachindex(a,b)
               s += a[i]*b[i]
           end
x = :(#= REPL[23]:1 =# @avx for i = eachindex(a, b)
          #= REPL[23]:2 =#
          s += a[i] * b[i]
      end)
11.942335582598114

LoopVectorization actually has to call macroexpand so that it can avoid parsing them, and so that things like Base.Cartesian.@nexprs can work as intended for unrolling loops.

I'd definitely accept a macro like that as a PR. Although I think it should be Union{Bool, Base.HWReal}, because it should be support Bools and I have a few tests for it.

chriselrod avatar Apr 29 '20 05:04 chriselrod

EDIT: hold on a second, I had made a mistake

baggepinnen avatar Apr 29 '20 06:04 baggepinnen

This seems to be working. Right now it only handles a single type parameter, but that can be improved if you find the approach okay

using MacroTools, LoopVectorization
macro maybeavx(ex)
    def = splitdef(ex)
    haskey(def,:whereparams) || throw(ArgumentError("Found no type parameter in the function definition"))
    length(def[:whereparams]) == 1 || throw(ArgumentError("Only a single type parameter supported"))
    T = def[:whereparams][1] # This is the type parameter

    # Remove @avx annotations
    ex_noavx = MacroTools.prewalk(ex) do x
        if x isa Expr && x.head === :macrocall
            return x.args[3] # This returns the expression inside the macrocall
        end
        x
    end
    quote
        # Define method without @avx annotations
        $(esc(ex_noavx))

        # Define the method that retains the @avx annotations
        $(esc(def[:name]))($(esc.(def[:args])...)) where $(esc(T)) <: Union{Bool, Base.HWReal} = $(esc(def[:body]))
    end
end


@maybeavx function f(a::T, b::Vector{T}) where T
    @avx a .+ b
end


using DoubleFloats
julia> f(1.0, ones(3))
3-element Array{Float64,1}:
 2.0
 2.0
 2.0

julia> f(Double64(1.0), Double64.(ones(3)))
3-element Array{Double64,1}:
 2.0
 2.0
 2.0

julia> @code_lowered f(1.0, ones(3))
CodeInfo(

1 ─      266 = Base.broadcasted(Main.:+, a, b)
│   %2 = 266
│   %3 = Core.apply_type(Main.Val, :Main)
│   %4 = (%3)()
│   %5 = LoopVectorization.vmaterialize(%2, %4)
│        267 = %5
└──      return %5
)

julia> @code_lowered f(Double64(1.0), Double64.(ones(3)))
CodeInfo(
1 ─ %1 = Base.broadcasted(Main.:+, a, b)
│   %2 = Base.materialize(%1)
└──      return %2
)

baggepinnen avatar Apr 29 '20 06:04 baggepinnen

Such a macro sounds like a neat idea. I wonder if it shouldn't insert the two options as if/else in the body of the function, to avoid strange interactions with dispatch.

Perhaps too complicated, but I also wonder whether it should check that the arrays aren't sparse, or CuArrays, too. Something like this, with a,b extracted from the LoopSet, and check_types recursively calling parent and to find something like Array{<:HWReal}:

@maybeavx function f(x, s)
    a, b = x .+ 1, x .- 1
    @avx for i ∈ eachindex(a,b)
        s += a[i]*b[i]
    end
    ...
# becomes:

function f(x, s)
    a, b = x .+ 1, x .- 1
    if check_types(a, b) 
        @avx for i ∈ eachindex(a,b)
            s += a[i]*b[i]
        end
   else
       for i ∈ eachindex(a,b)
            ...

Edit: if you want if right in front of @avx, then there's little need for a second macro on the function. The two alternatives could simply be what @avx returns.

mcabbott avatar Apr 29 '20 07:04 mcabbott

@baggepinnen , I liked your approach until:

Edit: if you want if right in front of @avx, then there's little need for a second macro on the function. The two alternatives could simply be what @avx returns.

Ha ha, that's a lot simpler than what I had suggested. If we wanted to replicate making it optional (like @maybeavx), @avx could have a key word argument (such as force=false), that you can set to true to remove the check if for some reason you thought it would work with some non-hw types.

A modification of the @maybeavx macro that returns two functions with different names (one with, and one without) @avx would be useful for testing to make sure the function is still returning the correct results. The test suite, for example, is filled with such nearly-duplicate function definitions.

chriselrod avatar Apr 29 '20 12:04 chriselrod

Perhaps something like this:

check_valid_args() = true
check_valid_args(::Any) = false
check_valid_args(::Type{T}) where {T <: Union{Base.HWReal, Bool}} = true
check_valid_args(::T) where {T <: Number} = check_valid_args(T)
check_valid_args(::StridedArray{T}) where {T} = check_valid_args(T)
check_valid_args(::AbstractRange{T}) where {T} = check_valid_args(T)
check_valid_args(a, b, args...) = check_valid_args(a) && check_valid_args(b) && check_valid_args(args....)

This would let you to opt in an array type MyArray via either defining:

struct MyArray{T,N} <: DenseArray{T,N}
...
end

or

LoopVectorization.check_valid_args(::MyArray{T}) where {T} = LoopVectorization.check_valid_args(T)

And to opt in your own element type, MyElementType, similarly via

LoopVectorization.check_valid_args(::Type{MyElementType}) = true

Some additional definitions:

using LinearAlgebra
check_valid_args(A::Adjoint) = check_valid_args(parent(A))
check_valid_args(A::Transpose) = check_valid_args(parent(A))
using OffsetArrays
check_valid_args(A::OffsetArray) = check_valid_args(parent(A))

Anything else I'm missing? I don't think I should make a generic AbstractArray fallback with parent, because odds are that the wrapper type will need explicit support if it didn't subtype DenseArray.

chriselrod avatar Apr 29 '20 13:04 chriselrod

My suggestion for how to check applicability was going be something ... not so different to yours, but allowing for wrappers such as those of NamedDims, which don't do anything which this has to know about, but are nice to pass along.

mcabbott avatar Apr 29 '20 13:04 mcabbott

My problem with that generic definition is

I don't think I should make a generic AbstractArray fallback with parent, because odds are that the wrapper type will need explicit support if it didn't subtype DenseArray

For example:

struct MyOffsetArray{T,N} <: AbstractArray{T,N}
    data::Array{T,N}
    offsets::NTuple{N,Int}
end
Base.@propagate_inbounds function Base.getindex(A::MyOffsetArray{T,N}, i::Varargs{Int,N}) where {T,N}
    A.data[(i .+ A.offsets)...]
end
Base.parent(A::MyOffsetArray) = A.data
# Other methods, including `Base.strides` and `Base.pointer`/`Base.unsafe_convert(Ptr{T}, ...)`

But maybe that's okay. Those array types that don't work correctly:

julia> pointer(A')
ERROR: conversion to pointer not defined for Adjoint{Float64,Array{Float64,2}}
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] unsafe_convert(::Type{Ptr{Float64}}, ::Adjoint{Float64,Array{Float64,2}}) at ./pointer.jl:67
 [3] pointer(::Adjoint{Float64,Array{Float64,2}}) at ./abstractarray.jl:933
 [4] top-level scope at REPL[27]:1

julia> pointer(OffsetArray(A,1,1))
ERROR: conversion to pointer not defined for OffsetArray{Float64,2,Array{Float64,2}}
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] unsafe_convert(::Type{Ptr{Float64}}, ::OffsetArray{Float64,2,Array{Float64,2}}) at ./pointer.jl:67
 [3] pointer(::OffsetArray{Float64,2,Array{Float64,2}}) at ./abstractarray.jl:933
 [4] top-level scope at REPL[28]:1

julia> pointer(A)
Ptr{Float64} @0x000055bfbb939f40

Don't define some of the needed methods, meaning they'll get errors and at least learn about it.

EDIT: It's a question of how far on the opt-in vs opt-out spectrum we should be. When it comes to things like NamedArray, it is unfortunate that it'd be type piracy for someone else to make it work.

Would be great if there were a acts_like_a_normal_strided_array trait that everyone implemented.

chriselrod avatar Apr 29 '20 13:04 chriselrod

I guess the clear criteria are:

  • false for CuArrays (which we can't see) and sparse arrays
  • false for reshapes which aren't StridedArrays, and for Diagonal, UpperTriangular etc (these wrappers we could see)
  • true for wrappers PermutedDimsArray, Transpose. And Adjoint, since it will only contain real numbers, and https://github.com/JuliaLang/julia/pull/29135 will define strides & pointer.

As you say, after that it depends how optimistic you want to be about weird wrappers. Besides OffsetArrays (for which one could test axes(A)), are there any defined by packages which break things? NamedDimsArrays, and AxisArrays & friends, and NNlib.batched_transpose are examples of things which don't.

My PR here https://github.com/FluxML/NNlib.jl/pull/191/files has a go at making an is_strided which is optimistic about foreign wrappers. (But should probably reject OffsetArrays.)

(Re traits, there is https://github.com/JuliaMatrices/ArrayLayouts.jl, and I quite like https://github.com/JuliaLang/julia/pull/30432 but it hasn't been merged.)

mcabbott avatar Apr 29 '20 13:04 mcabbott

Okay, I'm fine with being more optimistic and using the recursive typeof(parent(A)) === typeof(A) trick.

I'll work on this tonight.

chriselrod avatar Apr 29 '20 14:04 chriselrod

Got around to it about a day late, but I did add it yesterday, using mcabbot's implementation.

Example from the unit tests, wrapper type that does not subtype DenseArray and does not define parent:

using LoopVectorization
struct FallbackArrayWrapper{T,N} <: AbstractArray{T,N}
    data::Array{T,N}
end
Base.size(A::FallbackArrayWrapper) = size(A.data)
Base.@propagate_inbounds Base.getindex(A::FallbackArrayWrapper, i::Vararg{Int, N}) where {N} = getindex(A.data, i...)
Base.@propagate_inbounds Base.setindex!(A::FallbackArrayWrapper, v, i::Vararg{Int, N}) where {N} = setindex!(A.data, v, i...)
Base.IndexStyle(::Type{<:FallbackArrayWrapper}) = IndexLinear()

function msd(x)
    s = zero(eltype(x))
    for i in eachindex(x)
        s += x[i] * x[i]
    end
    s
end
function msdavx(x)
    s = zero(eltype(x))
    @avx for i in eachindex(x)
        s += x[i] * x[i]
    end
    s
end

x = fill(sqrt(63.0), 128);
x[1] = 1e9;

This yields

julia> msd(x)
1.0e18

julia> msdavx(x)
1.0000000000000081e18

julia> msdavx(FallbackArrayWrapper(x))
1.0e18

The test is based on the fact that 1e18 + 63 == 1e18, so if it doesn't get unrolled, no matter how many times we add 63, we still have 1e18 at the end. If it does get unrolled however, each 63 that doesn't share a lane with 1e18 avoids getting eaten, and gets added on at the end.

Although, this begs the question: Perhaps the fall back definition should use one or both of @inbounds and @fastmath (in which case I'd have to modify the above test)?

chriselrod avatar May 02 '20 17:05 chriselrod

I am not sure how this addresses

  • @avx not working for custom array types, e.g., FillArray
  • The user would still get an exception when calling the function if the array is not wrapped in the FallBackWrapper

The macro-based approach would solve both those problems?

baggepinnen avatar May 04 '20 06:05 baggepinnen

julia> using FillArrays, LoopVectorization

julia> function msdavx(x)
           s = zero(eltype(x))
           @avx for i in eachindex(x)
               s += x[i] * x[i]
           end
           s
       end
msdavx (generic function with 1 method)

julia> msdavx(Ones(100))
100.0

Have an example of this not working? The intention was not for anyone to wrap their arrays with FallbackArrayWrapper (it wouldn't have been strictly defined to only hold Arrays if that were the case), but to show a fairly minimal custom array implementation that runs on the fallback loop. Should be clear that it isn't doing anything special to invoke the fallback behavior.

help?> LoopVectorization.check_args
  check_args(::Vararg{AbstractArray})


  LoopVectorization will optimize an @avx loop if check_args on each on the indexed abstract arrays returns true. It returns true for AbstractArray{T}s when check_type(T) == true and the array or its parent is a StridedArray or AbstractRange.

  To provide support for a custom array type, ensure that check_args returns true, either through overloading it or subtyping DenseArray. Additionally, define pointer and stride methods.

AbstractFill does not subtype StridedArray, and parent(::AbstractFill) doesn't either, so it fails the check.

If you have an error with a custom array type that is/has a parent that is a StridedArray, you can overload check_args to return false.

chriselrod avatar May 04 '20 08:05 chriselrod

AbstractFill does not subtype StridedArray

Note that StridedArray is not an abstract type that you can subtype from but a union, see also JuliaLang/julia#2345.

stev47 avatar May 04 '20 08:05 stev47

I think this is the right approach, meant to dig a bit for edge cases but haven't yet.

Re FillArray, if I understand right this is being caught and sent to the non-vectorised path (since it's its own parent, and not a StridedArray). It could be allowed on the vectorised path, since it's just a constant, but might be messy. I got tired of avoiding this by inserting a trivial broadcast like gradient(x -> sum((x->x).( ...)), ...) and so wrote a special case for Fill, explicitly replacing indexing with one value. But only for one argument, dy, here.

Alternatively, I sometimes wonder if FillArray should define strides which are all zero... how many functions would accept such a thing?

mcabbott avatar May 04 '20 08:05 mcabbott

Ah I see, I misunderstood your comment to mean that one would wrap inputs in the fallback wrapper, but I understand now that it was only used to demonstrate that it worked :) Sorry for the noise. I'll update LV and see if it solved all the issues I had!

baggepinnen avatar May 04 '20 08:05 baggepinnen

Note that StridedArray is not an abstract type that you can subtype from but a union, see also JuliaLang/julia#2345.

Yes, DenseArray is an abstract type, and DenseArray <: StridedArray. This is what I did for custom array types defined in the unit tests.

There's another issue about implementing the strided interface as a trait instead, which would make it more flexible.

It could be allowed on the vectorised path, since it's just a constant, but might be messy.

LLVM can optimize simple loops with constants very well. If I add @inbounds (I think I'll make this happen automatically, so users don't have to) and use integers:

julia> function msdavx(x)
           s = zero(eltype(x))
           @avx for i in eachindex(x)
               @inbounds s += x[i]
           end
           s
       end
msdavx (generic function with 1 method)

julia> @code_llvm debuginfo=:none msdavx(Ones{Int}(100))

define i64 @julia_msdavx_2597([1 x [1 x [1 x i64]]] addrspace(11)* nocapture nonnull readonly dereferenceable(8)) {
top:
  %1 = getelementptr inbounds [1 x [1 x [1 x i64]]], [1 x [1 x [1 x i64]]] addrspace(11)* %0, i64 0, i64 0, i64 0, i64 0
  %2 = load i64, i64 addrspace(11)* %1, align 8
  %3 = icmp sgt i64 %2, 0
  %spec.select = select i1 %3, i64 %2, i64 0
  ret i64 %spec.select
}

It just checks if the length is greater than 0 (icmp sgt i64 %2, 0), and selects either the length if so (and 0 otherwise).

so wrote a special case for Fill, explicitly replacing indexing with one value.

Yeah, that's much harder to do generically, but otherwise will work well.

Alternatively, I sometimes wonder if FillArray should define strides which are all zero... how many functions would accept such a thing?

Strides equal to 0 are useful for broadcasting a dimension when loading from a pointer, but in the case of FillArrays:

julia> foo(x) = @inbounds x[1000]
foo (generic function with 1 method)

julia> foo(Ones(2))
1.0

getindex doesn't actually care about the index value, except for the purpose of bounds checks. So I don't think this is necessary, at least in the way I've used 0-strides. Is there something I'm missing?

Ah I see, I misunderstood your comment to mean that one would wrap inputs in the fallback wrapper, but I understand now that it was only used to demonstrate that it worked :) Sorry for the noise. I'll update LV and see if it solved all the issues I had!

No problem. If you do run into trouble, consider overloading check_args, or possibly file an issue. Maybe it's a type that wont be too difficult to support.

chriselrod avatar May 04 '20 09:05 chriselrod

OK I tried to make up some test cases. true --> false means the change with my attempt below these, instead of the one on master (linked above). It's not perfect but it catches a few more things.

using LoopVectorization, LinearAlgebra, SparseArrays, FillArrays
# LoopVectorization v0.7.4, which does not use check_args

x1 = view(rand(10,10), 1:2:10, 1:3)
x1 isa StridedArray
check_args(x1) # true 
@avx x1 .+ 1 # ok

x2 = x1' # not a StridedArray
x2 = transpose(x1) # ditto
check_args(x2) # true
@avx x2 .+ 1 # ok

x3 = reshape(view(rand(10,10), 1:2:10, 1:3), 3, 5) # not a StridedArray
check_args(x3) # true -> false
@avx x3 .+ 1 # ERROR: MethodError: no method matching strides

x4 = view(x2, 1:3, 1:3) 
check_args(x4) # true --> false
@avx x4 .+ 1 # ERROR: conversion to pointer not defined

x5 = PermutedDimsArray(rand(10,10), (2,1))
check_args(x5) # true
@avx x5 .+ 1 # ok

x6 = view(x5, 1:3, 1:3) 
check_args(x6) # true --> false, too pessimistic about views of wrappers
@avx x6 .+ 1 # ok

x7 = Diagonal(rand(10))
check_args(x7) # true --> false
@avx x7 .+ 1 # MethodError: no method matching ndims

x8 = sparse(rand(3,3))
check_args(x8) # false
@avx x8 .+ 1 # no method matching similar(::Base.Broadcast.Broadcasted{SparseArrays

x9 = Fill(1.2, 3,4)
check_args(x9) # false
@avx x9 .+ 1 # MethodError: no method matching vmaterialize

@inline check_args(A::StridedArray) = check_type(eltype(A))
@inline check_args(A::AbstractRange) = check_type(eltype(A))
@inline function check_args(A::AbstractArray)
    M = parentmodule(typeof(A))
    if parent(A) === A # SparseMatrix, StaticArray, etc
        false
    elseif A isa Union{Transpose, Adjoint}
        check_args(parent(A))
    elseif M === Base || M === Core || M ===LinearAlgebra
        # reshapes which aren't StridedArrays, plus UpperTriangular, etc.
        false
    else
        check_args(parent(A)) # PermutedDimsArray, NamedDimsArray
    end
end

mcabbott avatar May 05 '20 18:05 mcabbott

Thanks for the suggestions. I'm using that now, with the addition to get check_args(x6) and similar to return true:

@inline check_args(A::SubArray) = check_args(parent(A))
@inline check_args(A::OffsetArray) = check_args(parent(A))
@inline check_args(A::Adjoint) = check_args(parent(A))
@inline check_args(A::Transpose) = check_args(parent(A))
@inline check_args(A::PermutedDimsArray) = check_args(parent(A))

The ::StridedArray definition seems to be more specific than the ::SubArray, so that strided subarrays direct there (rather than resulting in ambiguity errors). I also removed the elseif A isa Union{Transpose, Adjoint}, adding it to the other short 1-liner definitions.

Thanks again for the suggestion. You're also welcome to make PRs of course.

chriselrod avatar May 06 '20 20:05 chriselrod

But not every view of something valid still valid. StridedArray has some elaborate checks so that some views of an Array are, and some aren't, included. My elseif M === Base ... false test assumes the worst, and while it's too pessimistic re x6, a better example of what it's trying to catch would have been this:

x10 = view(x2, [1,3,2], 1:3) # x2::Transpose, which does not have strides(), but is OK
@avx x10 .+ 1  # ERROR: conversion to pointer not defined for SubArray{Float64,2...

Perhaps the next level is to test something like this:

x10.indices isa Tuple{Vararg{Union{Int,Colon,AbstractRange}}} # false, but true for x6

... and eventually we will end up re-writing StridedArray in every package!

This setup also doesn't reject CuArrays (which I had on the list above), but that's OK since the fallback loops won't work there either.

mcabbott avatar May 06 '20 22:05 mcabbott

Broadcasts also aren't using check_args yet, but you're right that x10 will pass, but it shouldn't.

Which check do CuArrays pass? Shouldn't they fail on parent(A) === A?

Perhaps the next level is to test something like this:

x10.indices isa Tuple{Vararg{Union{Int,Colon,AbstractRange}}} # false, but true for x6

... and eventually we will end up re-writing StridedArray in every package!

I think you were being sarcastic, but...if that's what it takes to improve resolution?

chriselrod avatar May 06 '20 23:05 chriselrod

I was mostly, and I don't swear that I got that right! It would be great to have some shared trait-like way to query this.

I think CuArrays are <: StridedArray, but am at the wrong computer to check.

mcabbott avatar May 06 '20 23:05 mcabbott

I thought the point of being strides is to interface with C and Fortran libraries. E.g., making an array strides will cause many methods to dispatch to BLAS/LAPACK. While CuArrays are probably actually strides, passing them to your typical C or Fortran library won't work, so that seems like it's be inappropriate subtyping.

I'm also in favor of the trait PR. I'd also like to support SubArrays with bitarray or index vector inds. Aliasing won't supported when storing into integer-indexed SubArrays. I recall broadcasting also resulting in undesirable behavior when that happens.

While they'd fail in loops, once broadcasting uses check_args it'd be nice to have CuArrays work there.

chriselrod avatar May 07 '20 08:05 chriselrod

Yes, I'm not sure what the logic was. On the "re-writing StridedArray in every package" front, well-behaved views of CuArrays are again of type CuArray, not SubArray, so that they can work with functions which dispatch on this. Again we should have some trait like storage_type which unwraps everything. There is also talk of doing some higher-order context thing, in which they would be Arrays within cuda() do ... end, but this is above my pay-grade.

Re broadcasting into aliased views, I think this goes wrong for CuArrays, and for @strided, but is allowed by Base. This came up here: https://github.com/JuliaGPU/CuArrays.jl/issues/684

mcabbott avatar May 07 '20 09:05 mcabbott

The CPU may have gotten the correct answer in that CuArrays issue, but it is not immune from such weirdness, this is the issue I was thinking of:

julia> x = [3]; v = view(x, [1,1])
2-element view(::Array{Int64,1}, [1, 1]) with eltype Int64:
 3
 3

julia> x[[1,1]] .+= v
2-element view(::Array{Int64,1}, [1, 1]) with eltype Int64:
 6
 6

julia> x = [3]; v = view(x, [1,1])
2-element view(::Array{Int64,1}, [1, 1]) with eltype Int64:
 3
 3

julia> @views x[[1,1]] .+= v
2-element view(::Array{Int64,1}, [1, 1]) with eltype Int64:
 9
 9

julia> x = [3]; I = [1,1]; v = view(x, I)
2-element view(::Array{Int64,1}, [1, 1]) with eltype Int64:
 3
 3

julia> x[I] .+= v
2-element view(::Array{Int64,1}, [1, 1]) with eltype Int64:
 9
 9

julia> x = [3]; I = [1,1]; v = view(x, I)
2-element view(::Array{Int64,1}, [1, 1]) with eltype Int64:
 3
 3

julia> @views x[I] .+= v
2-element view(::Array{Int64,1}, [1, 1]) with eltype Int64:
 12
 12

Although this was with a single index. When it is supported in LoopVectorization, it'll behave more like the CuArrays version and have the same problem described in https://github.com/JuliaGPU/CuArrays.jl/issues/684.

Also realized one of the package's benchmarks relied on it working for triangular matrices. This benchmark benefits substantially from LoopVectorization (although icc and ifort win), so it no longer working as is is unfortunate. So some cases that previously worked no longer do. I modified it to use parent and work on the underlying array. I'll update the doc strings of @avx and the documentation to try and make clear that's the recommended practice for working with those array types, because unless the underlying data on the wrong side of the diagonal is actually zero, reading from them will yield incorrect results. And triangular iteration isn't supported yet.

chriselrod avatar May 07 '20 13:05 chriselrod

Thanks for the aliasing links, these things get tricky fast.

It took me a while but eventually I realise your UpperTriangular example is only reading the diagonal, which is fine. The generic case is not of course:

julia> rr = rand(3,3); mm = similar(rr); uu = UpperTriangular(rr)

julia> @avx for i in eachindex(uu)
       mm[i] = uu[i] + 10
       end

julia> mm
3×3 Array{Float64,2}:
 10.4958  10.9055  10.1759
 10.5271  10.5611  10.1929
 10.3082  10.0969  10.7074

however on the latest version, instead I get ERROR: UndefVarError: LinearAlgebra not defined from check_args.

mcabbott avatar May 07 '20 18:05 mcabbott