julia
julia copied to clipboard
Fix an off-by-one error in interpreter's `do_invoke`
Does this need a test?
Especially since there have been multiple pushes after the pull request was accepted, showing that (1) that version wasn't ready and (2) there are no tests making sure the code works as expected.
The previous version did OOB indexing and so was UB. ~~IDK what kind of test would examine that behavior reliably. Do you have an idea for a test?~~ So there is no reliable test to examine that problem.
One possible test could try to call do_invoke with a very large nargs so that the OOB access would write across a page boundary and cause a segfault.
Here is an attempt to generate a test case:
# segfault.jl
macro generate_test_case(n)
as = [ Symbol("a$i") for i in 1:n ]
compute = Meta.parse(join(as, '+'))
return esc(quote
@noinline function sum_54443($(as...))
return $compute
end
@noinline function wrapped_sum_54443($(as...))
return sum_54443($(as...))
end
end)
end
macro call_test_case(n)
unpack_as = [ :(as[$i]) for i in 1:n ]
return esc(quote
as = [ randn() for _ in 1:$n ]
wrapped_sum_54443($(unpack_as...))
end)
end
@generate_test_case(10)
@call_test_case(10)
This generates what I think is a problematic invoke call
julia> code_typed(wrapped_sum_54443, NTuple{10,Int64}; optimize=true) # note optimize=true!!
1-element Vector{Any}:
CodeInfo(
1 ─ %1 = invoke Main.sum_54443(a1::Int64, a2::Int64, a3::Int64, a4::Int64, a5::Int64, a6::Int64, a7::Int64, a8::Int64, a9::Int64, a10::Int64)::Int64
└── return %1
) => Int64
However, I have the problem that I don't even know how to call into do_invoke.
I tested with adding an artificial jl_error(); to do_invoke() and then run the above as julia --compile=min segfault.jl, but without failure.
Adding more printfs tells me that the interpreter is still doing something in jl_eval_value, but just never ends up in do_invoke.
Could it be that the interpreter just sees the CodeInfo generated with optimize=false which contains no explicit :invoke?
julia> code_typed(wrapped_sum_54443, NTuple{10,Int64}; optimize=false)
1-element Vector{Any}:
CodeInfo(
1 ─ nothing::Core.Const(nothing)
│ %2 = Main.sum_54443(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10)::Int64
└── return %2
) => Int64
I am really just guessing here ...
Adding more
printfs tells me that the interpreter is still doing something injl_eval_value, but just never ends up indo_invoke
It actually ends up in do_call doing the computation, which IIUC is the dynamic dispatch and do_invoke would be static dispatch. So is there a way to force static dispatch in the interpreter?
One possible test could try to call
do_invokewith a very largenargsso that the OOB access would write across a page boundary and cause a segfault.
According to the embedding section of the docs the JL_GC_PUSH... macros do no heap allocation, but instead store the elements on the C stack, cf. https://docs.julialang.org/en/v1/manual/embedding/#Memory-Management. So I think this strategy can't be used either to trigger a segfault.
@vtjnash May I ask you for another review and a decision on whether this needs a test or not? If it needs a test, do you have any suggestions on how to approach this?
OOB indexing is not UB. Anyways, this version also looks fine
In theory we would use a static analyzer type of test here, but I don't think we provide the bounds information to the analyzer right now, or if it would try to reason about the symbolic bound here if we did
The ASAN build tests in CI also should fail if this code ever got used, but we have no tests currently written for it specifically
OOB indexing is not UB.
Why?
Isn't the Access out of bounds examples from https://en.cppreference.com/w/c/language/behavior equivalent to the problem here?
The ASAN build tests in CI also should fail if this code ever got used, but we have no tests currently written for it specifically
Nice. So I just need to figure out how to call into do_invoke.
Julia unlike C bounds-checks by default, so it's just an error, not UB.
Access is UB, but indexing is generally valid (and, as a special case, indexing one past the last element is always considered valid)
Thanks. Very interesting, I did not know about that. Here is also a SO post with a quote from the C99 standard that clarifies this: https://stackoverflow.com/a/3145001
Access is UB, but indexing is generally valid
Deref when i is OOB is UB. I don't follow what you mean with "access" and "indexing".
and, as a special case, indexing one past the last element is always considered valid
That's the first I'm hearing of this; could you share the relevant section of the standard? To clarify, are you talking about something like foo = &arr[i] (which is defined not to semantically deref in C, so isn't UB), or are you talking about foo = arr[i], which does semantically deref (and is hence UB if i would be OOB)?
Here is also a SO post with a quote from the C99 standard that clarifies this: https://stackoverflow.com/a/3145001
Note that the SO post is specifically about &arr[i], i.e. taking the address of a would-be array element. The quoted section of the standard clarifies that this is the same as &(*(arr + i)), which is defined to be the same as just arr + i, i.e. there is no dereference taking place semantically (which is what would trigger the UB).
See also this SO post about the situation in C++.
This fixed https://github.com/JuliaLang/julia/issues/54054 apparently if you want to use it as a test.
Ah, the test checking #54054 already exists, but no one noticed it because aarch64-linux-gnu is allowed to fail (and it has been failing for way too long).