LoopVectorization.jl
LoopVectorization.jl copied to clipboard
Failing gracefully
It might be worth thinking about falling back to traditional loops if there are unrecognized expressions while issuing a warning about not optimizing instead of a hard error.
Feel free to close if this doesn't align with your goals: errors are fine too if the expectance is that @avx
should fail if unable to optimize.
Hi, thanks for the issue.
The two times problems can be noticed are:
- Macro-expand time.
- When compiling the generated function.
Macro-expansion happens immediately upon defining a function. It'll create a LoopSet
object that describes the loop, and then summarizes it, creating a call to the generated function _avx_!
(which is passed the summary, and all necessary variables, as arguments).
I think I'd prefer errors immediately at macro-expansion time, if possible.
If I wanted fall-back behavior, I could just delete the @avx
. You could use @test_broken
in your test suite to monitor and find out if/when it starts to work (in which case you would replace @test_broken
with @test
).
When compiling the generated function, I would like to at least have type-based fall-back behavior. This could allow either using a black or whitelist to make it easier to write generic code. If there are types involved that it can make fast, it will. Otherwise, they'll run as written.
I've been enjoying some awesome 2x performance improvements from sticking an @avx
in front of some lines of code. Unfortunately, this appears to have lost me the ability to pass in Double64
to my function, or differentiate it with ForwardDiff, which is also need to be able to do. Would there be a way for me to keep the @avx
and have it fall back to standard code if the types involved are "exotic"? I could potentially maintain two different functions, that that would lead to a ton of code duplication.
When compiling the generated function, I would like to at least have type-based fall-back behavior. This could allow either using a black or whitelist to make it easier to write generic code. If there are types involved that it can make fast, it will. Otherwise, they'll run as written.
This sounds like it would be exactly what I need!
The major obstacle to this is that the original loop expression is lost, and the LoopSet
object doesn't preserve the original structure. It represents loops as a set of operations that depend on one another, external arrays, and specific loop indices.
Perhaps this is another argument for https://github.com/chriselrod/LoopVectorization.jl/issues/74
As a stop gap, we could revive a modification of this code as a stop-gap.
@_avx_!
could then be given the original loop expression, and check to confirm that the element type of each AbstractArray
is in
julia> Union{Bool, Base.HWReal}
Union{Bool, Float32, Float64, Int16, Int32, Int64, Int8, UInt16, UInt32, UInt64, UInt8}
If not, it'll return the original quote.
PRs welcome if anyone wants to try their hand at implementing this. ;)
BTW, you can see some examples of using it with ForwardDiff here: https://github.com/chriselrod/LoopVectorization.jl/issues/93
I'm not sure if I understand what the @_avx
macro is doing. Is it inserting some info about "I want to @avx
this" rather than avx:ing it immediately?
I was contemplating if a macro could be put at the entire function definition to from the following
function f(a::T, b::Vector{T}) where T
generate two methods, the one above, and additionally this one
function f(a::T, b::Vector{T}) where T <: Base.HWReal
where only the last one would contain the @avx
features and the other one would be completely free of @avx
. That would make compilation time faster for the @avx
-free method I guess, but I'm not sure how to make the macro on the function definition "see" the code before it's transformed by the inner @avx
annotations.
The file I linked no longer exists, and is no longer how things work.
The macro now:
julia> @macroexpand @avx for i ∈ eachindex(a,b)
s += a[i]*b[i]
end
quote
var"##loopi#1406" = LoopVectorization.maybestaticrange(eachindex(a, b))
var"##i_loop_lower_bound#1407" = LoopVectorization.maybestaticfirst(var"##loopi#1406")
var"##i_loop_upper_bound#1408" = LoopVectorization.maybestaticlast(var"##loopi#1406")
var"##vptr##_a" = LoopVectorization.stridedpointer(a)
var"##vptr##_b" = LoopVectorization.stridedpointer(b)
local var"##s_0"
begin
$(Expr(:gc_preserve, :(var"##s_0" = LoopVectorization._avx_!(Val{(0, 0)}(), Tuple{:LoopVectorization, :getindex, LoopVectorization.OperationStruct(0x0000000000000001, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000, LoopVectorization.memload, 0x01, 0x01), :LoopVectorization, :getindex, LoopVectorization.OperationStruct(0x0000000000000001, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000, LoopVectorization.memload, 0x02, 0x02), :LoopVectorization, :LOOPCONSTANTINSTRUCTION, LoopVectorization.OperationStruct(0x0000000000000000, 0x0000000000000000, 0x0000000000000001, 0x0000000000000000, LoopVectorization.constant, 0x00, 0x03), :LoopVectorization, :vfmadd_fast, LoopVectorization.OperationStruct(0x0000000000000001, 0x0000000000000001, 0x0000000000000000, 0x0000000000010203, LoopVectorization.compute, 0x00, 0x03)}, Tuple{LoopVectorization.ArrayRefStruct{:a,Symbol("##vptr##_a")}(0x0000000000000001, 0x0000000000000001, 0x0000000000000000), LoopVectorization.ArrayRefStruct{:b,Symbol("##vptr##_b")}(0x0000000000000001, 0x0000000000000001, 0x0000000000000000)}, Tuple{0, Tuple{4}, Tuple{3}, Tuple{}, Tuple{}, Tuple{}, Tuple{}}, Tuple{:i}, (var"##i_loop_lower_bound#1407":var"##i_loop_upper_bound#1408",), var"##vptr##_a", var"##vptr##_b", s)), :a, :b))
end
s = LoopVectorization.reduced_add(var"##s_0", s)
end
It generates a call to _avx_!, along with a summary of what _avx_!
needs to know about the original loop. This is how it uses compile-time information (instead of just macro-expansion-time info) to optimize the loops.
but I'm not sure how to make the macro on the function definition "see" the code before it's transformed by the inner @avx annotations.
Outer macros expand first, so this shouldn't be a problem.
julia> a = rand(43); b = rand(43);
julia> s = 0.0;
julia> macro showex(x)
esc(@show x)
end
@showex (macro with 1 method)
julia> @showex @avx for i ∈ eachindex(a,b)
s += a[i]*b[i]
end
x = :(#= REPL[23]:1 =# @avx for i = eachindex(a, b)
#= REPL[23]:2 =#
s += a[i] * b[i]
end)
11.942335582598114
LoopVectorization actually has to call macroexpand so that it can avoid parsing them, and so that things like Base.Cartesian.@nexprs
can work as intended for unrolling loops.
I'd definitely accept a macro like that as a PR. Although I think it should be Union{Bool, Base.HWReal}
, because it should be support Bool
s and I have a few tests for it.
EDIT: hold on a second, I had made a mistake
This seems to be working. Right now it only handles a single type parameter, but that can be improved if you find the approach okay
using MacroTools, LoopVectorization
macro maybeavx(ex)
def = splitdef(ex)
haskey(def,:whereparams) || throw(ArgumentError("Found no type parameter in the function definition"))
length(def[:whereparams]) == 1 || throw(ArgumentError("Only a single type parameter supported"))
T = def[:whereparams][1] # This is the type parameter
# Remove @avx annotations
ex_noavx = MacroTools.prewalk(ex) do x
if x isa Expr && x.head === :macrocall
return x.args[3] # This returns the expression inside the macrocall
end
x
end
quote
# Define method without @avx annotations
$(esc(ex_noavx))
# Define the method that retains the @avx annotations
$(esc(def[:name]))($(esc.(def[:args])...)) where $(esc(T)) <: Union{Bool, Base.HWReal} = $(esc(def[:body]))
end
end
@maybeavx function f(a::T, b::Vector{T}) where T
@avx a .+ b
end
using DoubleFloats
julia> f(1.0, ones(3))
3-element Array{Float64,1}:
2.0
2.0
2.0
julia> f(Double64(1.0), Double64.(ones(3)))
3-element Array{Double64,1}:
2.0
2.0
2.0
julia> @code_lowered f(1.0, ones(3))
CodeInfo(
1 ─ 266 = Base.broadcasted(Main.:+, a, b)
│ %2 = 266
│ %3 = Core.apply_type(Main.Val, :Main)
│ %4 = (%3)()
│ %5 = LoopVectorization.vmaterialize(%2, %4)
│ 267 = %5
└── return %5
)
julia> @code_lowered f(Double64(1.0), Double64.(ones(3)))
CodeInfo(
1 ─ %1 = Base.broadcasted(Main.:+, a, b)
│ %2 = Base.materialize(%1)
└── return %2
)
Such a macro sounds like a neat idea. I wonder if it shouldn't insert the two options as if/else in the body of the function, to avoid strange interactions with dispatch.
Perhaps too complicated, but I also wonder whether it should check that the arrays aren't sparse, or CuArrays, too. Something like this, with a,b
extracted from the LoopSet, and check_types
recursively calling parent
and to find something like Array{<:HWReal}
:
@maybeavx function f(x, s)
a, b = x .+ 1, x .- 1
@avx for i ∈ eachindex(a,b)
s += a[i]*b[i]
end
...
# becomes:
function f(x, s)
a, b = x .+ 1, x .- 1
if check_types(a, b)
@avx for i ∈ eachindex(a,b)
s += a[i]*b[i]
end
else
for i ∈ eachindex(a,b)
...
Edit: if you want if
right in front of @avx
, then there's little need for a second macro on the function. The two alternatives could simply be what @avx
returns.
@baggepinnen , I liked your approach until:
Edit: if you want if right in front of @avx, then there's little need for a second macro on the function. The two alternatives could simply be what @avx returns.
Ha ha, that's a lot simpler than what I had suggested.
If we wanted to replicate making it optional (like @maybeavx
), @avx
could have a key word argument (such as force=false
), that you can set to true to remove the check if for some reason you thought it would work with some non-hw types.
A modification of the @maybeavx
macro that returns two functions with different names (one with, and one without) @avx
would be useful for testing to make sure the function is still returning the correct results.
The test suite, for example, is filled with such nearly-duplicate function definitions.
Perhaps something like this:
check_valid_args() = true
check_valid_args(::Any) = false
check_valid_args(::Type{T}) where {T <: Union{Base.HWReal, Bool}} = true
check_valid_args(::T) where {T <: Number} = check_valid_args(T)
check_valid_args(::StridedArray{T}) where {T} = check_valid_args(T)
check_valid_args(::AbstractRange{T}) where {T} = check_valid_args(T)
check_valid_args(a, b, args...) = check_valid_args(a) && check_valid_args(b) && check_valid_args(args....)
This would let you to opt in an array type MyArray
via either defining:
struct MyArray{T,N} <: DenseArray{T,N}
...
end
or
LoopVectorization.check_valid_args(::MyArray{T}) where {T} = LoopVectorization.check_valid_args(T)
And to opt in your own element type, MyElementType
, similarly via
LoopVectorization.check_valid_args(::Type{MyElementType}) = true
Some additional definitions:
using LinearAlgebra
check_valid_args(A::Adjoint) = check_valid_args(parent(A))
check_valid_args(A::Transpose) = check_valid_args(parent(A))
using OffsetArrays
check_valid_args(A::OffsetArray) = check_valid_args(parent(A))
Anything else I'm missing?
I don't think I should make a generic AbstractArray
fallback with parent
, because odds are that the wrapper type will need explicit support if it didn't subtype DenseArray
.
My suggestion for how to check applicability was going be something ... not so different to yours, but allowing for wrappers such as those of NamedDims, which don't do anything which this has to know about, but are nice to pass along.
My problem with that generic definition is
I don't think I should make a generic
AbstractArray
fallback with parent, because odds are that the wrapper type will need explicit support if it didn't subtypeDenseArray
For example:
struct MyOffsetArray{T,N} <: AbstractArray{T,N}
data::Array{T,N}
offsets::NTuple{N,Int}
end
Base.@propagate_inbounds function Base.getindex(A::MyOffsetArray{T,N}, i::Varargs{Int,N}) where {T,N}
A.data[(i .+ A.offsets)...]
end
Base.parent(A::MyOffsetArray) = A.data
# Other methods, including `Base.strides` and `Base.pointer`/`Base.unsafe_convert(Ptr{T}, ...)`
But maybe that's okay. Those array types that don't work correctly:
julia> pointer(A')
ERROR: conversion to pointer not defined for Adjoint{Float64,Array{Float64,2}}
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] unsafe_convert(::Type{Ptr{Float64}}, ::Adjoint{Float64,Array{Float64,2}}) at ./pointer.jl:67
[3] pointer(::Adjoint{Float64,Array{Float64,2}}) at ./abstractarray.jl:933
[4] top-level scope at REPL[27]:1
julia> pointer(OffsetArray(A,1,1))
ERROR: conversion to pointer not defined for OffsetArray{Float64,2,Array{Float64,2}}
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] unsafe_convert(::Type{Ptr{Float64}}, ::OffsetArray{Float64,2,Array{Float64,2}}) at ./pointer.jl:67
[3] pointer(::OffsetArray{Float64,2,Array{Float64,2}}) at ./abstractarray.jl:933
[4] top-level scope at REPL[28]:1
julia> pointer(A)
Ptr{Float64} @0x000055bfbb939f40
Don't define some of the needed methods, meaning they'll get errors and at least learn about it.
EDIT:
It's a question of how far on the opt-in vs opt-out spectrum we should be.
When it comes to things like NamedArray
, it is unfortunate that it'd be type piracy for someone else to make it work.
Would be great if there were a acts_like_a_normal_strided_array
trait that everyone implemented.
I guess the clear criteria are:
- false for CuArrays (which we can't see) and sparse arrays
- false for reshapes which aren't StridedArrays, and for Diagonal, UpperTriangular etc (these wrappers we could see)
- true for wrappers PermutedDimsArray, Transpose. And Adjoint, since it will only contain real numbers, and https://github.com/JuliaLang/julia/pull/29135 will define strides & pointer.
As you say, after that it depends how optimistic you want to be about weird wrappers. Besides OffsetArrays (for which one could test axes(A)
), are there any defined by packages which break things? NamedDimsArrays, and AxisArrays & friends, and NNlib.batched_transpose
are examples of things which don't.
My PR here https://github.com/FluxML/NNlib.jl/pull/191/files has a go at making an is_strided
which is optimistic about foreign wrappers. (But should probably reject OffsetArrays.)
(Re traits, there is https://github.com/JuliaMatrices/ArrayLayouts.jl, and I quite like https://github.com/JuliaLang/julia/pull/30432 but it hasn't been merged.)
Okay, I'm fine with being more optimistic and using the recursive typeof(parent(A)) === typeof(A)
trick.
I'll work on this tonight.
Got around to it about a day late, but I did add it yesterday, using mcabbot's implementation.
Example from the unit tests, wrapper type that does not subtype DenseArray
and does not define parent
:
using LoopVectorization
struct FallbackArrayWrapper{T,N} <: AbstractArray{T,N}
data::Array{T,N}
end
Base.size(A::FallbackArrayWrapper) = size(A.data)
Base.@propagate_inbounds Base.getindex(A::FallbackArrayWrapper, i::Vararg{Int, N}) where {N} = getindex(A.data, i...)
Base.@propagate_inbounds Base.setindex!(A::FallbackArrayWrapper, v, i::Vararg{Int, N}) where {N} = setindex!(A.data, v, i...)
Base.IndexStyle(::Type{<:FallbackArrayWrapper}) = IndexLinear()
function msd(x)
s = zero(eltype(x))
for i in eachindex(x)
s += x[i] * x[i]
end
s
end
function msdavx(x)
s = zero(eltype(x))
@avx for i in eachindex(x)
s += x[i] * x[i]
end
s
end
x = fill(sqrt(63.0), 128);
x[1] = 1e9;
This yields
julia> msd(x)
1.0e18
julia> msdavx(x)
1.0000000000000081e18
julia> msdavx(FallbackArrayWrapper(x))
1.0e18
The test is based on the fact that 1e18 + 63 == 1e18
, so if it doesn't get unrolled, no matter how many times we add 63
, we still have 1e18
at the end. If it does get unrolled however, each 63
that doesn't share a lane with 1e18
avoids getting eaten, and gets added on at the end.
Although, this begs the question:
Perhaps the fall back definition should use one or both of @inbounds
and @fastmath
(in which case I'd have to modify the above test)?
I am not sure how this addresses
-
@avx
not working for custom array types, e.g., FillArray - The user would still get an exception when calling the function if the array is not wrapped in the
FallBackWrapper
The macro-based approach would solve both those problems?
julia> using FillArrays, LoopVectorization
julia> function msdavx(x)
s = zero(eltype(x))
@avx for i in eachindex(x)
s += x[i] * x[i]
end
s
end
msdavx (generic function with 1 method)
julia> msdavx(Ones(100))
100.0
Have an example of this not working?
The intention was not for anyone to wrap their arrays with FallbackArrayWrapper
(it wouldn't have been strictly defined to only hold Array
s if that were the case), but to show a fairly minimal custom array implementation that runs on the fallback loop.
Should be clear that it isn't doing anything special to invoke the fallback behavior.
help?> LoopVectorization.check_args
check_args(::Vararg{AbstractArray})
LoopVectorization will optimize an @avx loop if check_args on each on the indexed abstract arrays returns true. It returns true for AbstractArray{T}s when check_type(T) == true and the array or its parent is a StridedArray or AbstractRange.
To provide support for a custom array type, ensure that check_args returns true, either through overloading it or subtyping DenseArray. Additionally, define pointer and stride methods.
AbstractFill
does not subtype StridedArray
, and parent(::AbstractFill)
doesn't either, so it fails the check.
If you have an error with a custom array type that is/has a parent that is a StridedArray
, you can overload check_args
to return false
.
AbstractFill
does not subtypeStridedArray
Note that StridedArray
is not an abstract type that you can subtype from but
a union, see also JuliaLang/julia#2345.
I think this is the right approach, meant to dig a bit for edge cases but haven't yet.
Re FillArray, if I understand right this is being caught and sent to the non-vectorised path (since it's its own parent, and not a StridedArray). It could be allowed on the vectorised path, since it's just a constant, but might be messy. I got tired of avoiding this by inserting a trivial broadcast like gradient(x -> sum((x->x).( ...)), ...)
and so wrote a special case for Fill, explicitly replacing indexing with one value. But only for one argument, dy
, here.
Alternatively, I sometimes wonder if FillArray should define strides
which are all zero... how many functions would accept such a thing?
Ah I see, I misunderstood your comment to mean that one would wrap inputs in the fallback wrapper, but I understand now that it was only used to demonstrate that it worked :) Sorry for the noise. I'll update LV and see if it solved all the issues I had!
Note that
StridedArray
is not an abstract type that you can subtype from but a union, see also JuliaLang/julia#2345.
Yes, DenseArray
is an abstract type, and DenseArray <: StridedArray
.
This is what I did for custom array types defined in the unit tests.
There's another issue about implementing the strided interface as a trait instead, which would make it more flexible.
It could be allowed on the vectorised path, since it's just a constant, but might be messy.
LLVM can optimize simple loops with constants very well. If I add @inbounds
(I think I'll make this happen automatically, so users don't have to) and use integers:
julia> function msdavx(x)
s = zero(eltype(x))
@avx for i in eachindex(x)
@inbounds s += x[i]
end
s
end
msdavx (generic function with 1 method)
julia> @code_llvm debuginfo=:none msdavx(Ones{Int}(100))
define i64 @julia_msdavx_2597([1 x [1 x [1 x i64]]] addrspace(11)* nocapture nonnull readonly dereferenceable(8)) {
top:
%1 = getelementptr inbounds [1 x [1 x [1 x i64]]], [1 x [1 x [1 x i64]]] addrspace(11)* %0, i64 0, i64 0, i64 0, i64 0
%2 = load i64, i64 addrspace(11)* %1, align 8
%3 = icmp sgt i64 %2, 0
%spec.select = select i1 %3, i64 %2, i64 0
ret i64 %spec.select
}
It just checks if the length is greater than 0
(icmp sgt i64 %2, 0
), and select
s either the length if so (and 0 otherwise).
so wrote a special case for Fill, explicitly replacing indexing with one value.
Yeah, that's much harder to do generically, but otherwise will work well.
Alternatively, I sometimes wonder if FillArray should define
strides
which are all zero... how many functions would accept such a thing?
Strides equal to 0 are useful for broadcasting a dimension when loading from a pointer, but in the case of FillArrays
:
julia> foo(x) = @inbounds x[1000]
foo (generic function with 1 method)
julia> foo(Ones(2))
1.0
getindex
doesn't actually care about the index value, except for the purpose of bounds checks.
So I don't think this is necessary, at least in the way I've used 0-strides. Is there something I'm missing?
Ah I see, I misunderstood your comment to mean that one would wrap inputs in the fallback wrapper, but I understand now that it was only used to demonstrate that it worked :) Sorry for the noise. I'll update LV and see if it solved all the issues I had!
No problem. If you do run into trouble, consider overloading check_args
, or possibly file an issue.
Maybe it's a type that wont be too difficult to support.
OK I tried to make up some test cases. true --> false
means the change with my attempt below these, instead of the one on master (linked above). It's not perfect but it catches a few more things.
using LoopVectorization, LinearAlgebra, SparseArrays, FillArrays
# LoopVectorization v0.7.4, which does not use check_args
x1 = view(rand(10,10), 1:2:10, 1:3)
x1 isa StridedArray
check_args(x1) # true
@avx x1 .+ 1 # ok
x2 = x1' # not a StridedArray
x2 = transpose(x1) # ditto
check_args(x2) # true
@avx x2 .+ 1 # ok
x3 = reshape(view(rand(10,10), 1:2:10, 1:3), 3, 5) # not a StridedArray
check_args(x3) # true -> false
@avx x3 .+ 1 # ERROR: MethodError: no method matching strides
x4 = view(x2, 1:3, 1:3)
check_args(x4) # true --> false
@avx x4 .+ 1 # ERROR: conversion to pointer not defined
x5 = PermutedDimsArray(rand(10,10), (2,1))
check_args(x5) # true
@avx x5 .+ 1 # ok
x6 = view(x5, 1:3, 1:3)
check_args(x6) # true --> false, too pessimistic about views of wrappers
@avx x6 .+ 1 # ok
x7 = Diagonal(rand(10))
check_args(x7) # true --> false
@avx x7 .+ 1 # MethodError: no method matching ndims
x8 = sparse(rand(3,3))
check_args(x8) # false
@avx x8 .+ 1 # no method matching similar(::Base.Broadcast.Broadcasted{SparseArrays
x9 = Fill(1.2, 3,4)
check_args(x9) # false
@avx x9 .+ 1 # MethodError: no method matching vmaterialize
@inline check_args(A::StridedArray) = check_type(eltype(A))
@inline check_args(A::AbstractRange) = check_type(eltype(A))
@inline function check_args(A::AbstractArray)
M = parentmodule(typeof(A))
if parent(A) === A # SparseMatrix, StaticArray, etc
false
elseif A isa Union{Transpose, Adjoint}
check_args(parent(A))
elseif M === Base || M === Core || M ===LinearAlgebra
# reshapes which aren't StridedArrays, plus UpperTriangular, etc.
false
else
check_args(parent(A)) # PermutedDimsArray, NamedDimsArray
end
end
Thanks for the suggestions. I'm using that now, with the addition to get check_args(x6)
and similar to return true
:
@inline check_args(A::SubArray) = check_args(parent(A))
@inline check_args(A::OffsetArray) = check_args(parent(A))
@inline check_args(A::Adjoint) = check_args(parent(A))
@inline check_args(A::Transpose) = check_args(parent(A))
@inline check_args(A::PermutedDimsArray) = check_args(parent(A))
The ::StridedArray
definition seems to be more specific than the ::SubArray
, so that strided subarrays direct there (rather than resulting in ambiguity errors).
I also removed the elseif A isa Union{Transpose, Adjoint}
, adding it to the other short 1-liner definitions.
Thanks again for the suggestion. You're also welcome to make PRs of course.
But not every view of something valid still valid. StridedArray
has some elaborate checks so that some views of an Array
are, and some aren't, included. My elseif M === Base ... false
test assumes the worst, and while it's too pessimistic re x6
, a better example of what it's trying to catch would have been this:
x10 = view(x2, [1,3,2], 1:3) # x2::Transpose, which does not have strides(), but is OK
@avx x10 .+ 1 # ERROR: conversion to pointer not defined for SubArray{Float64,2...
Perhaps the next level is to test something like this:
x10.indices isa Tuple{Vararg{Union{Int,Colon,AbstractRange}}} # false, but true for x6
... and eventually we will end up re-writing StridedArray
in every package!
This setup also doesn't reject CuArrays (which I had on the list above), but that's OK since the fallback loops won't work there either.
Broadcasts also aren't using check_args
yet, but you're right that x10
will pass, but it shouldn't.
Which check do CuArrays
pass? Shouldn't they fail on parent(A) === A
?
Perhaps the next level is to test something like this:
x10.indices isa Tuple{Vararg{Union{Int,Colon,AbstractRange}}} # false, but true for x6
... and eventually we will end up re-writing StridedArray in every package!
I think you were being sarcastic, but...if that's what it takes to improve resolution?
I was mostly, and I don't swear that I got that right! It would be great to have some shared trait-like way to query this.
I think CuArrays are <: StridedArray, but am at the wrong computer to check.
I thought the point of being strides is to interface with C and Fortran libraries. E.g., making an array strides will cause many methods to dispatch to BLAS/LAPACK. While CuArrays are probably actually strides, passing them to your typical C or Fortran library won't work, so that seems like it's be inappropriate subtyping.
I'm also in favor of the trait PR.
I'd also like to support SubArray
s with bitarray or index vector inds. Aliasing won't supported when storing into integer-indexed SubArrays. I recall broadcasting also resulting in undesirable behavior when that happens.
While they'd fail in loops, once broadcasting uses check_args
it'd be nice to have CuArrays work there.
Yes, I'm not sure what the logic was. On the "re-writing StridedArray in every package" front, well-behaved views of CuArrays are again of type CuArray, not SubArray, so that they can work with functions which dispatch on this. Again we should have some trait like storage_type
which unwraps everything. There is also talk of doing some higher-order context thing, in which they would be Arrays within cuda() do ... end
, but this is above my pay-grade.
Re broadcasting into aliased views, I think this goes wrong for CuArrays, and for @strided
, but is allowed by Base. This came up here: https://github.com/JuliaGPU/CuArrays.jl/issues/684
The CPU may have gotten the correct answer in that CuArrays issue, but it is not immune from such weirdness, this is the issue I was thinking of:
julia> x = [3]; v = view(x, [1,1])
2-element view(::Array{Int64,1}, [1, 1]) with eltype Int64:
3
3
julia> x[[1,1]] .+= v
2-element view(::Array{Int64,1}, [1, 1]) with eltype Int64:
6
6
julia> x = [3]; v = view(x, [1,1])
2-element view(::Array{Int64,1}, [1, 1]) with eltype Int64:
3
3
julia> @views x[[1,1]] .+= v
2-element view(::Array{Int64,1}, [1, 1]) with eltype Int64:
9
9
julia> x = [3]; I = [1,1]; v = view(x, I)
2-element view(::Array{Int64,1}, [1, 1]) with eltype Int64:
3
3
julia> x[I] .+= v
2-element view(::Array{Int64,1}, [1, 1]) with eltype Int64:
9
9
julia> x = [3]; I = [1,1]; v = view(x, I)
2-element view(::Array{Int64,1}, [1, 1]) with eltype Int64:
3
3
julia> @views x[I] .+= v
2-element view(::Array{Int64,1}, [1, 1]) with eltype Int64:
12
12
Although this was with a single index. When it is supported in LoopVectorization, it'll behave more like the CuArrays version and have the same problem described in https://github.com/JuliaGPU/CuArrays.jl/issues/684.
Also realized one of the package's benchmarks relied on it working for triangular matrices.
This benchmark benefits substantially from LoopVectorization (although icc and ifort win), so it no longer working as is is unfortunate.
So some cases that previously worked no longer do.
I modified it to use parent
and work on the underlying array. I'll update the doc strings of @avx
and the documentation to try and make clear that's the recommended practice for working with those array types, because unless the underlying data on the wrong side of the diagonal is actually zero, reading from them will yield incorrect results. And triangular iteration isn't supported yet.
Thanks for the aliasing links, these things get tricky fast.
It took me a while but eventually I realise your UpperTriangular example is only reading the diagonal, which is fine. The generic case is not of course:
julia> rr = rand(3,3); mm = similar(rr); uu = UpperTriangular(rr)
julia> @avx for i in eachindex(uu)
mm[i] = uu[i] + 10
end
julia> mm
3×3 Array{Float64,2}:
10.4958 10.9055 10.1759
10.5271 10.5611 10.1929
10.3082 10.0969 10.7074
however on the latest version, instead I get ERROR: UndefVarError: LinearAlgebra not defined
from check_args
.