julia RFC: Refactor MethodInstance to allow for more general specialization

Overview

This refactors the base MethodInstance data structure to the following:

mutable struct MethodSpecialization{D}
     # If def is a MethodSpecialization, inherits edges from parent
     const def::Union{Module, Method, MethodSpecialization}
     specTypes::Type{<:Tuple} # To be renamed `abi` in the future
     backedges::Vector
     cache::CodeInstance
     next::MethodSpecialization # N.B.: No {D}
     data::D
end

struct DefaultSpec
    sparam_vals::SimpleVector
    inInference::UInt8
    cache_with_orig::UInt8
    precompiled::UInt8
end

struct UninferredSpec; end # Replaces owner === :uninferred

const MethodInstance = MethodSpecialization{DefaultSpecialization}

The owner field of CodeInstance is removed in favor of using separate toplevel MethodInstances.

Motivation

This refactor aims to unify a number of recent requirements on the internal cache. Broadly speaking, we'd like to cache (with proper invalidation and world age semantics) several classes of data:

World-age partitioned, type-specialized native code instances (the traditional MethodInstance/CodeInstance cache)
World-age partitioned, type-specialized non-native code instances (the GPUCompiler use case)
World-age partitioned, type-specialized non-inferred code (e.g. generated function results, some expensive-to-compute intermediate results in external absint)
World-age partitioned, finer-than-type (think constant arguments, return values, more fancy external absint specializations) native and non native code instances
World-age partitioned, type-specialized derived code instance (e.g. effect preconditions, see #50641)

Now, some of these are expected to be compiled by the standard julia execution engine (1, 4, 5), some of these have ABIs that match the type specialization (1, 2, 3), but generally they are not all the same. Most of these are invalidated along with the original method instance, but not all. Additionally, some of these (1, 4, 5) have more likely more edges than the default method-instance leading to over-invalidation.

Recently, we added the owner field to CodeInstance, which allowed us to put all of these into the cache, but that didn't given them support to be compiled/executed. I tried to fix that in #52797, but we didn't manage to figure out good precompile semantics, so that stalled.

This PR pulls up the owner field one level into the type tag of MethodSpecialization. This is partly to save the extra pointer in every CodeInstance, but also to allow partitioning the edges between native MethodInstances and those used by external abstract interpreters. There's a few different usage modes:

The external absint sets def to a Method. In this case, the set of edges is completely partitioned between the internal and external absint and can be managed according to absint requirements (e.g. this makes sense if the external absint is using an overlay table)
The external absint sets def to another MethodSpecialization. In this case the set of edges is extended. This is inteded to be used by absints, which wrap another absint and produce more fine grained specializations.

Additionally, all non-MethodInstance MethodSpecializations are allowed in Expr(:invoke). There's some TBD still for how to handle recovery on reload, but in principle everything should just go through. This thus closes #52797 as it addresses the same use case, but with proper edge tracking.

Currently, there's two D tags that the runtime system uses. DefaultSpec, which has the memoization of sparams and the various lock bits that the runtime uses. And UninferredSpec which is a singleton and replaces the owner === :uninferred CodeInstances introduced in #54362. I anticipate extending this further for effect preconditions and various finer-than-type specializations in Base.

Current Status

This PR changes the data structures, but does not yet provide the Core.Compiler utilities for cache lookup in non-default MethodSpecializations. That's on my immediate to do list. Additionally, the new edge/invalidation logic described above is not yet implemented. I also haven't tried the #52797 replacement yet to make sure it actually works properly. I'm putting this up as a draft to make sure that all relevant package developers have a chance to complain if I missed something important.

May 06 '24 08:05 Keno

Making them compilable/executable is #52964 since that requires tracking the current compiler instance through dynamic dispatch.

It requires tracking them trough dynamic dispatch if you want dynamic dispatch to do something other than the default. That's an orthogonal feature to this. This just lets you have multiple specializations with different ABIs for one particular method. They may :invoke each other, but if they dynamic dispatch, that still goes through the default compiler. #52964 is still desirable, but a separate concern (though it may be able to re-use part of the mechanism here).

I suppose in your proposal we would add a GPUSpec or ForeignSpec?

Yes

But right-now in many places the code assumes that it DefaultSpec.

Yes, I need to go through and disambiguate the assumptions, but that's a fair bit of work, so I wanted to get agreement on the direction first.

Part of my goal with owner is that we could re-use the code in many places and reduce the reliance on solutions like spinning up a separate OrcJIT.

That's still in scope, but again, not addressed by this. I think the cleanest way to address that is to also make CodeInstance parametric, so that the .inferred field can be something other than julia IR (e.g. LLVM IR) with corresponding helpers that know how to serialize that.

May 06 '24 22:05 Keno

That's still in scope, but again, not addressed by this. I think the cleanest way to address that is to also make CodeInstance parametric, so that the .inferred field can be something other than julia IR (e.g. LLVM IR) with corresponding helpers that know how to serialize that.

Yeah the challenge here is to figure out what fields have meaning and what is needed.

What are the requirements for something going through the native pipeline (but not wanting to poison it). The ci->owner goal was to maximize re-use up until it becomes hard.

Almost everyone reuses inferred to store the result of the high-level pipeline. Most GPUCompiler targets don't need specptr & co, but Enzyme would reuse those if we add the option to customize the LLVM pipeline.

May 07 '24 13:05 vchuravy

There's a few different usage modes:

The external absint sets def to a Method. In this case, the set of edges is completely partitioned between the internal and external absint and can be managed according to absint requirements (e.g. this makes sense if the external absint is using an overlay table)

The external absint sets def to another MethodSpecialization. In this case the set of edges is extended. This is inteded to be used by absints, which wrap another absint and produce more fine grained specializations.

Is this behavior already implemented, or is it something that needs to be implemented?

Also, I'm curious about how this change should affect the existing cache overload system based on cache_owner. If external abstract interpreters use InternalCodeCache, it seems to be necessary anymore. And if it uses an external code cache, only code_cache will be needed, so this makes the system look like how it was before.

May 14 '24 15:05 aviatesk

Is this behavior already implemented, or is it something that needs to be implemented?

It's partially implemented. Some cachine/invalidating/precompile logic is missing.

Also, I'm curious about how this change should affect the existing cache overload system based on cache_owner

InternalCodeCache takes a specialization type, which replaces the cache_owner mechanism.

May 15 '24 04:05 Keno

I don't yet see the end design and how it will work. It may be that an example of a not DefaultSpec would be helpful for me to understand the goal.

Right now I think that owner concept and this mechanism are orthogonal to each other. In particular I am worried that the polymorphism here here will be limited since the C code only expects DefaultSpec.

Maybe the answer is for Base to provide OtherSpec that has an owner field?

My original attempt when working on the integrated cache started with adding the owner field on the MethodInstance, but that was awkward since we use mi as a "result type" of a method query and at least in the GPUCompiler world we performed first the query independently and then we use it to drive compilation.

May 15 '24 05:05 vchuravy

Right now I think that owner concept and this mechanism are orthogonal to each other. In particular I am worried that the polymorphism here here will be limited since the C code only expects DefaultSpec.

The C code is mostly fine. I've been using it extensively with non-DefaultSpec things. There's a few places that still need to be updated, but it largely works fine.

Maybe the answer is for Base to provide OtherSpec that has an owner field?

That would be fine.

My original attempt when working on the integrated cache started with adding the owner field on the MethodInstance, but that was awkward since we use mi as a "result type" of a method query and at least in the GPUCompiler world we performed first the query independently and then we use it to drive compilation.

Yeah, it's slightly awkward, but you can still do that and just look through the next field as InternalCodeCache does in this PR.

May 15 '24 05:05 Keno