julia icon indicating copy to clipboard operation
julia copied to clipboard

Implement `circshift(::Tuple, ::Integer)`

Open aravindh-krishnamoorthy opened this issue 2 years ago • 27 comments

This PR implements the straightforward fix to #50732 proposed by @adienes in https://github.com/JuliaLang/julia/issues/50732#issuecomment-1658287178, along with a few additional (minor) optimisations to improve performance.

  • [X] Implementation
  • [X] Validate constant folding
  • [X] Include documentation strings
  • [X] Unify documentation with that of circshift/!. Note: documentation appears in the category Base->Arrays. Is this Ok? Ed: The answer seems to be yes.
  • [X] Unit tests
  • [X] ~~Double-check and isolate make crash with https://github.com/JuliaLang/julia/pull/52438/commits/c4a058d5af594d48fc954edbb970a5c006013e73, see https://github.com/JuliaLang/julia/pull/52438#issuecomment-1848939901~~ (excluded, no issue with the build environment.)
  • [X] Raise issue about register spills. Link: https://github.com/JuliaLang/julia/issues/52933

Checklist before getting this PR out of draft state:

  • [X] make compile check
  • [X] make docs compile check
  • [X] Passing unit tests

aravindh-krishnamoorthy avatar Dec 07 '23 10:12 aravindh-krishnamoorthy

Type stability with circshift(x::Tuple, shift::Int) = ntuple(j -> x[mod1(j-shift, length(x))], Val(length(x))).

With elements of a single type in the tuple, the function is type stable:
julia> x = (1, 2, 3, 4, 5)
julia> @code_warntype circshift(x, 1)
[...]
Body::NTuple{5, Int64}
[...]
julia> @code_warntype circshift(x, 1)
MethodInstance for circshift(::NTuple{5, Int64}, ::Int64)
  from circshift(x::Tuple, shift::Int64) @ Base tuple.jl:646
Arguments
  #self#::Core.Const(circshift)
  x::NTuple{5, Int64}
  shift::Int64
Locals
  #69::Base.var"#69#70"{NTuple{5, Int64}, Int64}
Body::NTuple{5, Int64}
1 ─ %1  = Base.ntuple::Core.Const(ntuple)
│   %2  = Base.:(var"#69#70")::Core.Const(Base.var"#69#70")
│   %3  = Core.typeof(x)::Core.Const(NTuple{5, Int64})
│   %4  = Core.typeof(shift)::Core.Const(Int64)
│   %5  = Core.apply_type(%2, %3, %4)::Core.Const(Base.var"#69#70"{NTuple{5, Int64}, Int64})
│         (#69 = %new(%5, x, shift))
│   %7  = #69::Base.var"#69#70"{NTuple{5, Int64}, Int64}
│   %8  = Base.Val::Core.Const(Val)
│   %9  = Base.length(x)::Core.Const(5)
│   %10 = (%8)(%9)::Core.Const(Val{5}())
│   %11 = (%1)(%7, %10)::NTuple{5, Int64}
└──       return %11
With elements of mixed types in the tuple, the function is not type stable? The output type is coloured red and is a tuple consisting of union of all mixed types:
julia> y = (1, 'a', -7.0)
(1, 'a', -7.0)
julia> @code_warntype circshift(y, 1)
[...]
Body::Tuple{Union{Char, Float64, Int64}, Union{Char, Float64, Int64}, Union{Char, Float64, Int64}}
[...]
julia> @code_warntype circshift(y, 1)
MethodInstance for circshift(::Tuple{Int64, Char, Float64}, ::Int64)
  from circshift(x::Tuple, shift::Int64) @ Base tuple.jl:646
Arguments
  #self#::Core.Const(circshift)
  x::Tuple{Int64, Char, Float64}
  shift::Int64
Locals
  #69::Base.var"#69#70"{Tuple{Int64, Char, Float64}, Int64}
Body::Tuple{Union{Char, Float64, Int64}, Union{Char, Float64, Int64}, Union{Char, Float64, Int64}}
1 ─ %1  = Base.ntuple::Core.Const(ntuple)
│   %2  = Base.:(var"#69#70")::Core.Const(Base.var"#69#70")
│   %3  = Core.typeof(x)::Core.Const(Tuple{Int64, Char, Float64})
│   %4  = Core.typeof(shift)::Core.Const(Int64)
│   %5  = Core.apply_type(%2, %3, %4)::Core.Const(Base.var"#69#70"{Tuple{Int64, Char, Float64}, Int64})
│         (#69 = %new(%5, x, shift))
│   %7  = #69::Base.var"#69#70"{Tuple{Int64, Char, Float64}, Int64}
│   %8  = Base.Val::Core.Const(Val)
│   %9  = Base.length(x)::Core.Const(3)
│   %10 = (%8)(%9)::Core.Const(Val{3}())
│   %11 = (%1)(%7, %10)::Tuple{Union{Char, Float64, Int64}, Union{Char, Float64, Int64}, Union{Char, Float64, Int64}}
└──       return %11

To be checked.

(Outdated. See below.)

aravindh-krishnamoorthy avatar Dec 07 '23 15:12 aravindh-krishnamoorthy

I don't think we need three methods here. On second look I'm not sure the NTuple approach really makes a difference, @ararslan maybe could you help clarify exactly what the benefit is?

@aplavin raised a good point on Slack that

The basic implementation is already type-stable, as long as shift value is somehow encoded in the type of that parameter. Otherwise stability is clearly impossible, because the output type would depend on the value of shift::Int

so really I think just

circshift(x::Tuple, shift::Val{T}) where T = ntuple(j -> x[mod1(j-T, length(x))], Val(length(x)))
circshift(x::Tuple, shift::Integer) = circshift(x, Val(shift))

can suffice

adienes avatar Dec 08 '23 15:12 adienes

On second look I'm not sure the NTuple approach really makes a difference, @ararslan maybe could you help clarify exactly what the benefit is?

Just code simplicity/clarity, which is subjective.

ararslan avatar Dec 08 '23 17:12 ararslan

I don't think we need three methods here. On second look I'm not sure the NTuple approach really makes a difference, @ararslan maybe could you help clarify exactly what the benefit is?

@aplavin raised a good point on Slack that

The basic implementation is already type-stable, as long as shift value is somehow encoded in the type of that parameter. Otherwise stability is clearly impossible, because the output type would depend on the value of shift::Int

so really I think just

circshift(x::Tuple, shift::Val{T}) where T = ntuple(j -> x[mod1(j-T, length(x))], Val(length(x)))
circshift(x::Tuple, shift::Integer) = circshift(x, Val(shift))

can suffice

Thank you for the comment @adienes. Indeed, the plan is to study the three implementations and retain the best one/s. I also want to understand the generated LL and native code. Furthermore, I want to check analogous methods and update the documentation to include the ::Val version, if required.

aravindh-krishnamoorthy avatar Dec 08 '23 22:12 aravindh-krishnamoorthy

Here's some additional analysis on type stability.

Ordinary variable shift::Integer

Code: circshift(x::Tuple, shift::Integer) = ntuple(j -> x[mod1(j-shift, length(x))], Val(length(x)))

  1. Type stable when the elements of x::Tuple are of the same type.
  2. (Understandably) type ambiguous Tuple{N, Union{...}} (red in @code_warntype) when the elements of x::Tuple have different types. See https://github.com/JuliaLang/julia/pull/52438#issuecomment-1845578963 above.

Compile-time constant shift::Integer

Code: circshift(x::Tuple, ::Val{shift}) where {shift} = ntuple(j -> x[mod1(j-shift, length(x))], Val(length(x)))

  1. Type stable in both cases (x::Tuple has same or different type elements) so long as shift is known at compile time.
  2. If shift is not known at compile time, same as (or even possibly worse per overheads than) the ordinary variable version above.
Note: that the following definition is not type stable:

circshift(x::Tuple, shift::Integer) = circshift(x, Val(shift))

julia> f(x, i) = circshift(x, Val(i))
f (generic function with 1 method)

julia> x = (1, 2, 3, 4, 5)
(1, 2, 3, 4, 5)

julia> @code_warntype f(x,2)
MethodInstance for f(::NTuple{5, Int64}, ::Int64)
  from f(x, i) @ Main REPL[8]:1
Arguments
  #self#::Core.Const(f)
  x::NTuple{5, Int64}
  i::Int64
Body::NTuple{5, Any}
1 ─ %1 = Main.circshift::Core.Const(circshift)
│   %2 = Main.Val(i)::Val
│   %3 = (%1)(x, %2)::NTuple{5, Any}
└──      return %3

(Outdated. See below.)

aravindh-krishnamoorthy avatar Dec 09 '23 21:12 aravindh-krishnamoorthy

Note: Make crashes with commit https://github.com/JuliaLang/julia/pull/52438/commits/c4a058d5af594d48fc954edbb970a5c006013e73 during build on my PC with Windows 11/WSL 2.0/i7-1165G7. Need to check and isolate the issue.

Crash log
ark:~/julia/julia$ make
    JULIA usr/lib/julia/corecompiler.ji
error during bootstrap:
LoadError(at "compiler/compiler.jl" line 3: LoadError(at "tuple.jl" line 648: LoadError(at "tuple.jl" line 671: MethodError(f=Core.Compiler.var"@inline", args=(#= Symbol("tuple.jl"):671 =#, Core.Compiler, Expr(:function, Expr(:where, Expr(:call, :circshift, Expr(:::, :x, :Tuple), Expr(:::, Expr(:curly, :Val, :shift))), :shift), Expr(:block, #= Symbol("tuple.jl"):671 =#, #= Symbol("tuple.jl"):672 =#, Expr(:::, :shift, :Integer), #= Symbol("tuple.jl"):673 =#, Expr(:return, Expr(:call, :ntuple, Expr(:->, :j, Expr(:block, #= Symbol("tuple.jl"):673 =#, Expr(:ref, :x, Expr(:call, :mod1, Expr(:call, :-, :j, :shift), Expr(:call, :length, :x))))), Expr(:call, :Val, Expr(:call, :length, :x))))))), world=0x00000000000003e0))))
jl_method_error_bare at /home/ark/julia/julia/src/gf.c:2204
jl_method_error at /home/ark/julia/julia/src/gf.c:2222
jl_invoke_julia_macro at /home/ark/julia/julia/src/ast.c:1130
jl_expand_macros at /home/ark/julia/julia/src/ast.c:1197
jl_expand_macros at /home/ark/julia/julia/src/ast.c:1209
ijl_expand_with_loc_warn at /home/ark/julia/julia/src/ast.c:1311
jl_parse_eval_all at /home/ark/julia/julia/src/toplevel.c:1062
ijl_load_ at /home/ark/julia/julia/src/toplevel.c:1112
unknown function (ip: 0x7ff9d8c8cbb9)
unknown function (ip: 0x7ff9d8c8d62d)
jl_apply at /home/ark/julia/julia/src/julia.h:2142 [inlined]
do_call at /home/ark/julia/julia/src/interpreter.c:126
eval_value at /home/ark/julia/julia/src/interpreter.c:223
eval_stmt_value at /home/ark/julia/julia/src/interpreter.c:174 [inlined]
eval_body at /home/ark/julia/julia/src/interpreter.c:656
jl_interpret_toplevel_thunk at /home/ark/julia/julia/src/interpreter.c:796
top-level scope at compiler/compiler.jl:58
jl_toplevel_eval_flex at /home/ark/julia/julia/src/toplevel.c:941
jl_eval_module_expr at /home/ark/julia/julia/src/toplevel.c:215 [inlined]
jl_toplevel_eval_flex at /home/ark/julia/julia/src/toplevel.c:741
ijl_toplevel_eval_in at /home/ark/julia/julia/src/toplevel.c:992
unknown function (ip: 0x7ff9d8c8c5e9)
jl_apply at /home/ark/julia/julia/src/julia.h:2142 [inlined]
do_call at /home/ark/julia/julia/src/interpreter.c:126
eval_value at /home/ark/julia/julia/src/interpreter.c:223
eval_stmt_value at /home/ark/julia/julia/src/interpreter.c:174 [inlined]
eval_body at /home/ark/julia/julia/src/interpreter.c:656
jl_interpret_toplevel_thunk at /home/ark/julia/julia/src/interpreter.c:796
top-level scope at compiler/compiler.jl:3
jl_toplevel_eval_flex at /home/ark/julia/julia/src/toplevel.c:941
jl_parse_eval_all at /home/ark/julia/julia/src/toplevel.c:1065
ijl_load_ at /home/ark/julia/julia/src/toplevel.c:1112
ijl_load at /home/ark/julia/julia/src/toplevel.c:1125
exec_program at /home/ark/julia/julia/src/jlapi.c:544
true_main at /home/ark/julia/julia/src/jlapi.c:601
jl_repl_entrypoint at /home/ark/julia/julia/src/jlapi.c:738
main at /home/ark/julia/julia/cli/loader_exe.c:58
unknown function (ip: 0x7ff9e5bac1c9)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_start at /home/ark/julia/julia/usr/bin/julia (unknown line)

make[1]: *** [sysimage.mk:64: /home/ark/julia/julia/usr/lib/julia/corecompiler.ji] Error 1
make: *** [Makefile:103: julia-sysimg-ji] Error 2

aravindh-krishnamoorthy avatar Dec 10 '23 11:12 aravindh-krishnamoorthy

Possible performance issue

Further to the type analysis above, while the low-level and native code generated for circshift(x, Val(i)) looks good on my PC Windows 11/WSL 2.0/i7-1165G7, I'm not happy with the low-level and native code generated for circshift(x, i). Am not convinced that the following the most efficient code that Julia can generate. For this case, more analysis on efficient alternative definitions for circshift is required.

Note: see edit history for generated low-level and assembly code.

Some alternatives

Note: these ideas are still under development and may not be fast or type stable. Please check back again later.

  1. ~~Avoid repeated mod1s~~ (necessary but partial solution):
function circshift(x, i)
    j = mod1(i, length(x))
    ntuple(k -> if k > j x[k-j] else x[k-j+length(x)] end, Val(length(x)))
end
  1. ~~Specialise for NTuple{N, Any}s. First run and @code_native are very slow for very long tuples ($N=400$).~~
function circshift(x::NTuple{N, Any}, i) where {N}
    j = mod1(i, N::Integer)
    ntuple(k -> if k > j x[k-j] else x[k-j+N] end, Val(N))
end
  1. ~~Convert to array, circshift, and convert back. Good for very long tuples ($N=400$) but type unstable.~~
function f(x::NTuple{N, Any}, i) where {N}
    tuple(circshift([x[i] for i=1:N], i)...)
end
  1. getindex and ifelse (no branch) version. 5.1. Type stable (to the extent possible) -- now type inference gets the output tuple length right for both single- and mixed-type tuples. 5.2. ~~Usually fully unrolled. Why?~~ (Tuple type encodes length. Hence, every function is specialised to the tuple's length, thereby enabling unrolling. Try @nospecialize if this is a problem.) 5.3. ~~getindex has inbounds check. Is this really required since axis of tuple is 1:length(x) and we're guaranteed to be within?~~ 5.4. ~~To be checked if specialisation to NTuple{N,T} (where T is not Any) improves performance. In this case, should the function be inlined in order to avoid clutter of the dispatch table (i.e. avoid a method for each length of tuple N)?~~ (Does not seem necessary, see revised 5.1 above.) 5.5. ifelse's are not vectorised and are computed individually (via scalar operations).
function circshift(x, i)
    j = mod1(i, length(x))
    ntuple(k -> Base.__safe_getindex(x, k-j+ifelse(k>j,0,length(x))), Val(length(x)))
end

aravindh-krishnamoorthy avatar Dec 11 '23 13:12 aravindh-krishnamoorthy

Should we avoid passing the length as Val(length(x)) when the length of x is not given by its type?

barucden avatar Dec 15 '23 08:12 barucden

Should we avoid passing the length as Val(length(x)) when the length of x is not given by its type?

Thank you for this comment @barucden. For tuples, as seen from the example below, it seems that the type does encode the length.

Example
julia> f(x) = length(x)
f (generic function with 1 method)

julia> x = (1, 2, 3, 4, 5)
(1, 2, 3, 4, 5)

julia> typeof(x)
NTuple{5, Int64}

julia> @code_warntype f(x)
MethodInstance for f(::NTuple{5, Int64})
  from f(x) @ Main REPL[18]:1
Arguments
  #self#::Core.Const(f)
  x::NTuple{5, Int64}
Body::Int64
1 ─ %1 = Main.length(x)::Core.Const(5)
└──      return %1


julia> y = (1, 'a', -7.0, 3)
(1, 'a', -7.0, 3)

julia> typeof(y)
Tuple{Int64, Char, Float64, Int64}

julia> @code_warntype f(y)
MethodInstance for f(::Tuple{Int64, Char, Float64, Int64})
  from f(x) @ Main REPL[18]:1
Arguments
  #self#::Core.Const(f)
  x::Tuple{Int64, Char, Float64, Int64}
Body::Int64
1 ─ %1 = Main.length(x)::Core.Const(4)
└──      return %1

aravindh-krishnamoorthy avatar Dec 15 '23 15:12 aravindh-krishnamoorthy

Hello, @adienes @jariji @jishnub @stevengj @musoke. Thank you for your encouragement. I feel that this PR is ready for review now.

Below are two items for which I request your suggestions. Besides these, other comments and suggestions on the code are also welcome.

  1. The new documentation, i.e., circshift for tuples, is located in the Base/Arrays section of the docs along with the circshift for arrays. Unfortunately, I couldn't find a tuple section in the docs. Is this fine?

  2. The current solution necessitates an evaluation of ifelse for each element of the tuple. Unfortunately, these ifelse parts are not vectorised but individually computed using scalar operations (llvm and native-i7-1165G7 code): %12 = select i1 %11, i64 5, i64 0 which maps to a bunch of scalar operations on my native machine.

Thank you and I look forward to your responses! This was a great learning experience!

aravindh-krishnamoorthy avatar Dec 15 '23 21:12 aravindh-krishnamoorthy

Am still not convinced by this solution. An alternative would be to have a circshift specialised for each shift and N.

Q: Is it Ok to have a (generated) method for each shift and N? Any other alternatives to avoid the ifelses?

aravindh-krishnamoorthy avatar Dec 23 '23 12:12 aravindh-krishnamoorthy

The ifelse calls are already optimized out (constant-folded) by LLVM when shift is a literal constant (either in the Val case or when you pass a literal inside another function, e.g. just f(x) = circshift(x, 2)). This seems like the most important case to make fast for tuples.

stevengj avatar Dec 23 '23 16:12 stevengj

Thank you for this response, @stevengj. Kindly find my observations below.

The ifelse calls are already optimized out (constant-folded) by LLVM when shift is a literal constant (either in the Val case or when you pass a literal inside another function, e.g. just f(x) = circshift(x, 2)). This seems like the most important case to make fast for tuples.

Unfortunately, the `ifelse` calls seem to be optimised out of `f(x) = circshift(x, 2)` only if `x`'s size is below (the magic number) $36$ (!) Beyond that, a generic function seems to be used (can't be seen from `@code_native`), which I suspect, based on the results of the example below, does not use constant folding. I also observe that, starting $N = 37,$ the outputs of `@code_llvm` and `@code_native` change (i.e., they `call julia_circshift_XXX` instead of inlining the code; adding `@inline` to the method definition doesn't seem to help).

From the example below (with elements of same type), we see that circshift(x::Tuple, shift::Integer) is faster than circshift(x::AbstractArray, shifts) for small $N.$ However, for large $N$, it can be up to $3\times$ slower. For large tuples with mixed element types, the function can be up to $100\times$ slower.

Example
julia> using BenchmarkTools

julia> f(x) = circshift(x, 1)
f (generic function with 1 method)

julia> x = collect(1:20) ;
julia> y = Tuple(x) ;
julia> @btime f(x) ;
  38.695 ns (2 allocations: 224 bytes)
julia> @btime f(y) ;
  18.853 ns (1 allocation: 176 bytes)

julia> x = collect(1:50) ;
julia> y = Tuple(x) ;
julia> @btime f(x) ;
  49.967 ns (2 allocations: 480 bytes)
julia> @btime f(y) ;
  54.974 ns (1 allocation: 448 bytes)

julia> x = collect(1:5000) ;
julia> y = Tuple(x) ;
julia> @btime f(x) ;
  1.427 μs (3 allocations: 39.12 KiB)
julia> @btime f(y) ;
  6.679 μs (1 allocation: 39.12 KiB)

julia>

Lastly, if the ifelse's are not constant folded, the resulting generated code is very ugly imho :(

Edit: minor updates + updated example.

Update

I'm very sorry @stevengj, my analysis above was wrong.

After adding the @inline directive to the method (commit: https://github.com/aravindh-krishnamoorthy/julia/commit/8e590cea9fb3381e7befa389445a11e1d79fc563), constant propagation indeed works for definitions of the form f(x) = circshift(x, 2) for both single and mixed-time tuples of large sizes (I tried up to $N=1000$), which you rightly identified earlier as the most important case.

Sadly, the $N=36$ constraint mentioned above is only applicable when @inline directive is not used. I must have gotten confused during my analysis despite taking precautions.

I will wait for a few days on the Julia Discourse thread (link below) and then finalise this PR. Both useful methods circshift(x, ::Val{shift}) and f(x) = circshift(x, 2) have optimal code. It's only the variable shifts version that lead to code that scales badly with $N.$

Update: I'm also not sure if the ::Val version is needed anymore since constant propagation works well. I'll look into this too.

aravindh-krishnamoorthy avatar Dec 23 '23 18:12 aravindh-krishnamoorthy

Large tuples are generally inefficient in Julia, no?

stevengj avatar Dec 26 '23 19:12 stevengj

Large tuples are generally inefficient in Julia, no?

Thank you, @stevengj. Indeed, this seems to be the case.

Note: The below analysis is only applicable to the case when shift is not a compile-time constant. See updated content in comment above (https://github.com/JuliaLang/julia/pull/52438#issuecomment-1868351598).

However, from the LLVM and native code, given that the output and input of circshift do not alias, I'm not sure why Julia waits until all the indices are computed before starting the copies (getelementptrs in LLVM). This waiting causes the intermediate indices to be stored and retrieved from the stack (spills and reloads) as well as makes the whole process sequential.

I'd be sufficiently happy if the index computation and copy are together in a block (without spills and reloads), even if the vectorisation potential is not exploited.

~~Perhaps I should seek inputs of Julia -> LLVM translators on this?~~ This is the real cause of the slowdown for circshift of large tuples, it seems.

Discourse question: https://discourse.julialang.org/t/generated-llvm-code-causes-register-spills-on-x86-64/108144

aravindh-krishnamoorthy avatar Dec 28 '23 21:12 aravindh-krishnamoorthy

Tuple length

Instead of computing length(x) in the body of the closure, you could instead dispatch on the length, i.e., use x::NTuple{len,Any} instead of x::Tuple. Alternatively, just be smarter about what you capture in the closure, like so:

len = length(x)
f = let len = len, x = x
    k -> ...
end
ntuple(f, Val(len))

__safe_getindex

Why do you use __safe_getindex, instead of the usual, higher-level interface? Is it actually necessary?

The ::Integer method and the ::Val method

If you're going to have a ::Val method, why don't you implement it directly so the implementation would make full use of the fact that the shift is in the type domain.

You could somewhat avoid code duplication by being smart about how you construct the closure.

nsajko avatar Dec 29 '23 00:12 nsajko

BTW, the PR seems to have conflicts.

nsajko avatar Dec 29 '23 00:12 nsajko

Hello @nsajko. Thank you very much for your comments on the code. Kindly find my response inline below.

Tuple length

Instead of computing length(x) in the body of the closure, you could instead dispatch on the length, i.e., use x::NTuple{len,Any} instead of x::Tuple. Alternatively, just be smarter about what you capture in the closure, like so:

Indeed, one of the candidate solutions above utilizes x::Tuple{N,Any}. Using length(x) does not seem to be a problem because, for tuples, it is a compile-time constant which can be obtained from the type. Utilizing x::Tuple{N,Any} or length(x) results in the same LLVM and native code.

__safe_getindex

Why do you use __safe_getindex, instead of the usual, higher-level interface? Is it actually necessary?

Using __safe_getindex just avoids the bound checks since we're sure we do not go out of bounds in the code.

The ::Integer method and the ::Val method

If you're going to have a ::Val method, why don't you implement it directly so the implementation would make full use of the fact that the shift is in the type domain.

You could somewhat avoid code duplication by being smart about how you construct the closure.

I learned about Julia's constant propagation during the implementation of this PR. What I observed is that constant propagation already ensures optimal code for the ::Val method, so I did not have to duplicate code. One can check this via @code_native circshift(x, Val(1)) for some small tuple x, where it is seen that the resulting code is fully vectorised (SSE/2 on Intel CPUs) and nearly optimal.

Furthermore, the ::Integer conversion generates a TypeError when invoking the function as, e.g., circshift(x, Val('a')). Without it we would get a MethodError, which might confuse a user.

Please let me know if you find my responses satisfactory. I must admit I did not fully understand your comment on closures. In case I've missed any crucial aspect of your post, kindly let me know.

Lastly, thank you for the info on the conflict. Seems it just happened with recent commits. I will fix it at the earliest.

aravindh-krishnamoorthy avatar Dec 29 '23 01:12 aravindh-krishnamoorthy

All done 👍

aravindh-krishnamoorthy avatar Jan 16 '24 22:01 aravindh-krishnamoorthy

as far as I can tell, this would be the first usage of __safe_getindex anywhere in the entire Julia source code. I empathize with the desire for maximum performance, but there are many many functions on Tuple happily making use of regular getindex, it's not clear to me why this function should be privileged so

adienes avatar Jan 16 '24 23:01 adienes

__safe_getindex(@nospecialize(t::Tuple), i::Int) = (@_nothrow_noub_meta; getfield(t, i, false))

which calls

# can be used in place of `@assume_effects :nothrow :noub` (supposed to be used for bootstrapping)
macro _nothrow_noub_meta()
  ...

so I wonder if @assume_effects :nothrow :noub would work here?

jariji avatar Jan 17 '24 00:01 jariji

as far as I can tell, this would be the first usage of __safe_getindex anywhere in the entire Julia source code. I empathize with the desire for maximum performance, but there are many many functions on Tuple happily making use of regular getindex, it's not clear to me why this function should be privileged so

Thank you for the comment @adienes. Indeed, I just chose __safe_getindex because (a) it was available and (b) it was the method that provided the best performance.

However, the performance improvement does indeed come at a cost, as shown below.

julia> x = (1, 2, 3, 4, 5)
(1, 2, 3, 4, 5)

julia> struct FakeInteger <: Integer x::Integer end
julia> Base.mod1(x::FakeInteger, y::T) where T<:Real = x.x

julia> circshift(x, FakeInteger(100000000000000))
[7699] signal 11 (128): Segmentation fault
in expression starting at REPL[5]:1
macro expansion at ./ntuple.jl:72 [inlined]
ntuple at ./ntuple.jl:69
[...]

The crash occurs because, in this case, mod1 does not keep its promise of returning a number below its second argument.

If you and/or other reviewers feel that, on balance, it's better to use getindex with bounds check to cover such extreme corner cases, I will be happy to change this. Please note that, as seen from @code_llvm and @code_native, bounds check will add several $10$ s of cycles for every index¹.

Note²: __safe_getindex was added to tuple.jl about $2$ months ago in #52186, which might explain why it's not widely used. Please see the "Blame" version here: https://github.com/JuliaLang/julia/blame/master/base/tuple.jl and search for __safe_getindex to see the code evolution.

Edits are marked with unicode superscripts: ¹, ².

aravindh-krishnamoorthy avatar Jan 17 '24 01:01 aravindh-krishnamoorthy

This one's also ready for review now IMHO. Issue #52933 was raised based on the intuitions gained in this PR. However, these changes can be merged independently of the issue. Comments and suggestions are welcome.

Note: I've removed the ::Val version as Julia's constant propagation was already producing efficient code when shift is a compile-time constant.

aravindh-krishnamoorthy avatar Feb 03 '24 22:02 aravindh-krishnamoorthy

as far as I can tell, this would be the first usage of __safe_getindex anywhere in the entire Julia source code. I empathize with the desire for maximum performance, but there are many many functions on Tuple happily making use of regular getindex, it's not clear to me why this function should be privileged so

Hello @adienes. Following your comment and my observation above with compile-time constant shifts, I have changed the implementation from __safe_getindex to getindex. My reasoning is that bounds check with getindex is essentially invisible (in terms of overheads) when shift is a compile-time constant. On the other hand, when shift is not a compile-time constant, the inefficiencies in the surrounding code, see #52933, drown the overheads due to bounds check by far.

Hope this change makes the implementation a bit more palatable.

aravindh-krishnamoorthy avatar Feb 14 '24 14:02 aravindh-krishnamoorthy

that makes sense to me. I think you have looked at the benchmarks a lot more carefully than I so I trust whatever conclusion you arrive at. (I'll note that the @inline is most likely unnecessary, but also I don't think it harms anything either way)

I'm happy to "approve" but I have no real authority so it will be nothing more than a fun green checkmark for someone with merge rights to find 😁

adienes avatar Feb 14 '24 15:02 adienes

Hello @jishnub. I've combined the documentation of circshift for arrays and tuples as advised here: https://github.com/JuliaLang/julia/pull/52438#discussion_r1428747437. Kindly let me know in case you have any further comments.

aravindh-krishnamoorthy avatar Feb 14 '24 15:02 aravindh-krishnamoorthy

that makes sense to me. I think you have looked at the benchmarks a lot more carefully than I so I trust whatever conclusion you arrive at. (I'll note that the @inline is most likely unnecessary, but also I don't think it harms anything either way)

I'm happy to "approve" but I have no real authority so it will be nothing more than a fun green checkmark for someone with merge rights to find 😁

Thank you @adienes! I've learnt a lot through this PR. But, I guess I approach every problem as though it's an inner loop optimisation problem 😂 I guess after this PR it's better to stick with linear algebra and other stuff I know well 😁

aravindh-krishnamoorthy avatar Feb 15 '24 00:02 aravindh-krishnamoorthy