julia icon indicating copy to clipboard operation
julia copied to clipboard

Fix an off-by-one error in interpreter's `do_invoke`

Open fatteneder opened this issue 1 year ago • 6 comments

fatteneder avatar May 11 '24 19:05 fatteneder

Does this need a test?

giordano avatar May 11 '24 22:05 giordano

Especially since there have been multiple pushes after the pull request was accepted, showing that (1) that version wasn't ready and (2) there are no tests making sure the code works as expected.

giordano avatar May 11 '24 22:05 giordano

The previous version did OOB indexing and so was UB. ~~IDK what kind of test would examine that behavior reliably. Do you have an idea for a test?~~ So there is no reliable test to examine that problem.

fatteneder avatar May 11 '24 23:05 fatteneder

One possible test could try to call do_invoke with a very large nargs so that the OOB access would write across a page boundary and cause a segfault.

Seelengrab avatar May 12 '24 12:05 Seelengrab

Here is an attempt to generate a test case:

# segfault.jl

macro generate_test_case(n)
  as = [ Symbol("a$i") for i in 1:n ]
  compute = Meta.parse(join(as, '+'))
  return esc(quote
    @noinline function sum_54443($(as...))
      return $compute
    end
    @noinline function wrapped_sum_54443($(as...))
      return sum_54443($(as...))
    end
  end)
end

macro call_test_case(n)
  unpack_as = [ :(as[$i]) for i in 1:n ]
  return esc(quote
    as = [ randn() for _ in 1:$n ]
    wrapped_sum_54443($(unpack_as...))
  end)
end

@generate_test_case(10)
@call_test_case(10)

This generates what I think is a problematic invoke call

julia> code_typed(wrapped_sum_54443, NTuple{10,Int64}; optimize=true) # note optimize=true!!
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = invoke Main.sum_54443(a1::Int64, a2::Int64, a3::Int64, a4::Int64, a5::Int64, a6::Int64, a7::Int64, a8::Int64, a9::Int64, a10::Int64)::Int64
└──      return %1
) => Int64

However, I have the problem that I don't even know how to call into do_invoke. I tested with adding an artificial jl_error(); to do_invoke() and then run the above as julia --compile=min segfault.jl, but without failure.

Adding more printfs tells me that the interpreter is still doing something in jl_eval_value, but just never ends up in do_invoke. Could it be that the interpreter just sees the CodeInfo generated with optimize=false which contains no explicit :invoke?

julia> code_typed(wrapped_sum_54443, NTuple{10,Int64}; optimize=false)
1-element Vector{Any}:
 CodeInfo(
1 ─      nothing::Core.Const(nothing)
│   %2 = Main.sum_54443(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10)::Int64
└──      return %2
) => Int64

I am really just guessing here ...

fatteneder avatar May 12 '24 17:05 fatteneder

Adding more printfs tells me that the interpreter is still doing something in jl_eval_value, but just never ends up in do_invoke

It actually ends up in do_call doing the computation, which IIUC is the dynamic dispatch and do_invoke would be static dispatch. So is there a way to force static dispatch in the interpreter?

fatteneder avatar May 12 '24 20:05 fatteneder

One possible test could try to call do_invoke with a very large nargs so that the OOB access would write across a page boundary and cause a segfault.

According to the embedding section of the docs the JL_GC_PUSH... macros do no heap allocation, but instead store the elements on the C stack, cf. https://docs.julialang.org/en/v1/manual/embedding/#Memory-Management. So I think this strategy can't be used either to trigger a segfault.


@vtjnash May I ask you for another review and a decision on whether this needs a test or not? If it needs a test, do you have any suggestions on how to approach this?

fatteneder avatar May 14 '24 07:05 fatteneder

OOB indexing is not UB. Anyways, this version also looks fine

vtjnash avatar May 14 '24 16:05 vtjnash

In theory we would use a static analyzer type of test here, but I don't think we provide the bounds information to the analyzer right now, or if it would try to reason about the symbolic bound here if we did

vtjnash avatar May 14 '24 16:05 vtjnash

The ASAN build tests in CI also should fail if this code ever got used, but we have no tests currently written for it specifically

vtjnash avatar May 14 '24 16:05 vtjnash

OOB indexing is not UB.

Why? Isn't the Access out of bounds examples from https://en.cppreference.com/w/c/language/behavior equivalent to the problem here?


The ASAN build tests in CI also should fail if this code ever got used, but we have no tests currently written for it specifically

Nice. So I just need to figure out how to call into do_invoke.

fatteneder avatar May 14 '24 18:05 fatteneder

Julia unlike C bounds-checks by default, so it's just an error, not UB.

oscardssmith avatar May 14 '24 18:05 oscardssmith

Access is UB, but indexing is generally valid (and, as a special case, indexing one past the last element is always considered valid)

vtjnash avatar May 14 '24 18:05 vtjnash

Thanks. Very interesting, I did not know about that. Here is also a SO post with a quote from the C99 standard that clarifies this: https://stackoverflow.com/a/3145001

fatteneder avatar May 14 '24 18:05 fatteneder

Access is UB, but indexing is generally valid

Deref when i is OOB is UB. I don't follow what you mean with "access" and "indexing".

and, as a special case, indexing one past the last element is always considered valid

That's the first I'm hearing of this; could you share the relevant section of the standard? To clarify, are you talking about something like foo = &arr[i] (which is defined not to semantically deref in C, so isn't UB), or are you talking about foo = arr[i], which does semantically deref (and is hence UB if i would be OOB)?

Here is also a SO post with a quote from the C99 standard that clarifies this: https://stackoverflow.com/a/3145001

Note that the SO post is specifically about &arr[i], i.e. taking the address of a would-be array element. The quoted section of the standard clarifies that this is the same as &(*(arr + i)), which is defined to be the same as just arr + i, i.e. there is no dereference taking place semantically (which is what would trigger the UB).

See also this SO post about the situation in C++.

Seelengrab avatar May 14 '24 22:05 Seelengrab

This fixed https://github.com/JuliaLang/julia/issues/54054 apparently if you want to use it as a test.

gbaraldi avatar May 17 '24 19:05 gbaraldi

Ah, the test checking #54054 already exists, but no one noticed it because aarch64-linux-gnu is allowed to fail (and it has been failing for way too long).

giordano avatar May 17 '24 19:05 giordano