julia allow external absint to hold custom data in `codeinst.inferred`

It has been possible for external abstract interpreters to keep custom data in codeinst.inferred (together /w overloading inlining_policy). After JuliaLang/julia#52233, when such external absint uses InternalCodeCache, this data is passed to jl_ir_flag_inferred, leading to segfaults in assertion builds.

This commit resolves the issue by omitting jl_ir_flag_inferred checks when the cache_owner is external. Nonetheless, a better resolution might be necessary. It suggests that the current design of code_owner and InternalCodeCache for the external cache system is somewhat flawed. A conceivable approach could involve:

Adding a layer similar to inlining_policy in CC.get(::WorldView{InternalCodeCache}) to enable safe redirection of custom data to the native interpreter's implementation.
Prohibiting custom data in the inferred field and directing such data to be kept in analysis_results.

Feb 12 '24 16:02 aviatesk

Does #53219 resolve this? It removes the need to handle code-instances in this location.

#52964 does require that code.inferred is something codegen and the interpreter can understand, so stashing additional data in .inferred is problematic and I would prefer custom data to go into analysis_results.

Feb 12 '24 16:02 vchuravy

In https://github.com/JuliaDebug/Cthulhu.jl/pull/540 I continued using the external code cache instead of using the internal code cache (but the external code cache no longer participates in invalidation logic)

Feb 12 '24 16:02 vchuravy

Does #53219 resolve this? It removes the need to handle code-instances in this location.

Probably. I'm looking into it.

In JuliaDebug/Cthulhu.jl#540 I continued using the external code cache instead of using the internal code cache (but the external code cache no longer participates in invalidation logic)

Cthulhu has traditionally created a new cache for each instance of CthulhuInterpreter, which meant there was no inherent support for invalidation required. On a personal note, I'm inclined to retain it as it facilitates easier reflection of outcomes following modifications to the base abstract interpretation's implementation.

Feb 12 '24 17:02 aviatesk

I think it's fine to have non-CodeInfo objects in (::CodeInstance).inferred. That's an explicit goal of https://github.com/JuliaLang/julia/pull/53219. https://github.com/JuliaLang/julia/pull/53219 also removes the (::CodeInfo).inferred field, which fixes the issue in this PR. It also fixes the underlying issue where the compiler is looking at the wrong CodeInstance in some cases. The only constraint we do need is that for owner==nothing, the CodeInstance's inferred field needs to be something the compiler understands.

Feb 12 '24 17:02 Keno

Yeah I'll work on the PR first and then we only need the test cases from this PR.

Feb 13 '24 04:02 aviatesk

I know it might sound like I'm flip-flopping, but I want to move forward with this PR as is. This is because, as things stand post-#52233, virtually all external abstract interpreters utilizing the external code cache with custom data are rendered inoperative on assertion builds. Given that the code in question will be eliminated by #53219 anyway in the future, I think this PR doesn't make any harm and merging this for now alone can be justified with the added test cases.

Feb 13 '24 12:02 aviatesk

Yeah, since jl_ir_flag_inferred is deleted in https://github.com/JuliaLang/julia/pull/53219 it may conflict somewhat with that PR content, but seems safe to merge this now as it is mostly just added tests

Feb 17 '24 17:02 vtjnash

To address Valentin's comment: it looks like this doesn't really express an opinion specifically on whether it is acceptable to use this field, but simply attempts to address the possibility of a missing type assert, which would be valid to check there regardless of whether we later state that other content should be put here

Thinking more broadly, I wonder if we should restructure this layering of types to instead look like this, so that there is a more bright line distinction in the types what is volatile and what is constant:

struct MethodInstance #= mostly unchanged =# end

mutable struct CodeCache # renamed from CodeInstance to indicate this is volatile
    parent::MethodInstance # for show reflection
    #= most fields from existing CodeInstance =#
   ...
    @atomic inferred::Union{CodeInfo, CompressedCodeInfo}
end

struct CodeInfo # partially kept as-is, but extended to allow external stmt representations
    #= fields from existing CodeInstance that are properties of the whole function =#
    ...
    parent::MethodInstance # for show reflection
    stmts::Union{StmtInfo, OpaqueExternalObjects, Nothing} # custom info goes here
end

struct StmtInfo # split from CodeInfo
    #= fields from existing CodeInstance that are arrays of per stmt info =#
   ...
end

Feb 17 '24 18:02 vtjnash