RFC: Refactor MethodInstance to allow for more general specialization
Overview
This refactors the base MethodInstance data structure to the following:
mutable struct MethodSpecialization{D}
# If def is a MethodSpecialization, inherits edges from parent
const def::Union{Module, Method, MethodSpecialization}
specTypes::Type{<:Tuple} # To be renamed `abi` in the future
backedges::Vector
cache::CodeInstance
next::MethodSpecialization # N.B.: No {D}
data::D
end
struct DefaultSpec
sparam_vals::SimpleVector
inInference::UInt8
cache_with_orig::UInt8
precompiled::UInt8
end
struct UninferredSpec; end # Replaces owner === :uninferred
const MethodInstance = MethodSpecialization{DefaultSpecialization}
The owner field of CodeInstance is removed in favor of using
separate toplevel MethodInstances.
Motivation
This refactor aims to unify a number of recent requirements on the internal cache. Broadly speaking, we'd like to cache (with proper invalidation and world age semantics) several classes of data:
- World-age partitioned, type-specialized native code instances (the traditional MethodInstance/CodeInstance cache)
- World-age partitioned, type-specialized non-native code instances (the GPUCompiler use case)
- World-age partitioned, type-specialized non-inferred code (e.g. generated function results, some expensive-to-compute intermediate results in external absint)
- World-age partitioned, finer-than-type (think constant arguments, return values, more fancy external absint specializations) native and non native code instances
- World-age partitioned, type-specialized derived code instance (e.g. effect preconditions, see #50641)
Now, some of these are expected to be compiled by the standard julia execution engine (1, 4, 5), some of these have ABIs that match the type specialization (1, 2, 3), but generally they are not all the same. Most of these are invalidated along with the original method instance, but not all. Additionally, some of these (1, 4, 5) have more likely more edges than the default method-instance leading to over-invalidation.
Recently, we added the owner field to CodeInstance, which allowed us
to put all of these into the cache, but that didn't given them support
to be compiled/executed. I tried to fix that in #52797, but we didn't
manage to figure out good precompile semantics, so that stalled.
This PR pulls up the owner field one level into the type tag of
MethodSpecialization. This is partly to save the extra pointer in
every CodeInstance, but also to allow partitioning the edges between
native MethodInstances and those used by external abstract interpreters.
There's a few different usage modes:
- The external absint sets
defto a Method. In this case, the set ofedgesis completely partitioned between the internal and external absint and can be managed according to absint requirements (e.g. this makes sense if the external absint is using an overlay table) - The external absint sets
defto anotherMethodSpecialization. In this case the set of edges is extended. This is inteded to be used by absints, which wrap another absint and produce more fine grained specializations.
Additionally, all non-MethodInstance MethodSpecializations are allowed
in Expr(:invoke). There's some TBD still for how to handle recovery
on reload, but in principle everything should just go through. This thus
closes #52797 as it addresses the same use case, but with proper edge
tracking.
Currently, there's two D tags that the runtime system uses.
DefaultSpec, which has the memoization of sparams and the various
lock bits that the runtime uses. And UninferredSpec which is a singleton
and replaces the owner === :uninferred CodeInstances introduced in #54362.
I anticipate extending this further for effect preconditions and various
finer-than-type specializations in Base.
Current Status
This PR changes the data structures, but does not yet provide the
Core.Compiler utilities for cache lookup in non-default
MethodSpecializations. That's on my immediate to do list.
Additionally, the new edge/invalidation logic described above
is not yet implemented. I also haven't tried the #52797 replacement
yet to make sure it actually works properly. I'm putting this up
as a draft to make sure that all relevant package developers have
a chance to complain if I missed something important.
Making them compilable/executable is #52964 since that requires tracking the current compiler instance through dynamic dispatch.
It requires tracking them trough dynamic dispatch if you want dynamic dispatch to do something other than the default. That's an orthogonal feature to this. This just lets you have multiple specializations with different ABIs for one particular method. They may :invoke each other, but if they dynamic dispatch, that still goes through the default compiler. #52964 is still desirable, but a separate concern (though it may be able to re-use part of the mechanism here).
I suppose in your proposal we would add a GPUSpec or ForeignSpec?
Yes
But right-now in many places the code assumes that it DefaultSpec.
Yes, I need to go through and disambiguate the assumptions, but that's a fair bit of work, so I wanted to get agreement on the direction first.
Part of my goal with owner is that we could re-use the code in many places and reduce the reliance on solutions like spinning up a separate OrcJIT.
That's still in scope, but again, not addressed by this. I think the cleanest way to address that is to also make CodeInstance parametric, so that the .inferred field can be something other than julia IR (e.g. LLVM IR) with corresponding helpers that know how to serialize that.
That's still in scope, but again, not addressed by this. I think the cleanest way to address that is to also make CodeInstance parametric, so that the .inferred field can be something other than julia IR (e.g. LLVM IR) with corresponding helpers that know how to serialize that.
Yeah the challenge here is to figure out what fields have meaning and what is needed.
What are the requirements for something going through the native pipeline (but not wanting to poison it). The ci->owner goal was to maximize re-use up until it becomes hard.
Almost everyone reuses inferred to store the result of the high-level pipeline.
Most GPUCompiler targets don't need specptr & co, but Enzyme would reuse those
if we add the option to customize the LLVM pipeline.
There's a few different usage modes:
- The external absint sets
defto a Method. In this case, the set ofedgesis completely partitioned between the internal and external absint and can be managed according to absint requirements (e.g. this makes sense if the external absint is using an overlay table)- The external absint sets
defto anotherMethodSpecialization. In this case the set of edges is extended. This is inteded to be used by absints, which wrap another absint and produce more fine grained specializations.
Is this behavior already implemented, or is it something that needs to be implemented?
Also, I'm curious about how this change should affect the existing cache overload system based on cache_owner. If external abstract interpreters use InternalCodeCache, it seems to be necessary anymore. And if it uses an external code cache, only code_cache will be needed, so this makes the system look like how it was before.
Is this behavior already implemented, or is it something that needs to be implemented?
It's partially implemented. Some cachine/invalidating/precompile logic is missing.
Also, I'm curious about how this change should affect the existing cache overload system based on
cache_owner
InternalCodeCache takes a specialization type, which replaces the cache_owner mechanism.
I don't yet see the end design and how it will work. It may be that an example of a not DefaultSpec would be helpful for me to understand the goal.
Right now I think that owner concept and this mechanism are orthogonal to each other. In particular I am worried that the polymorphism here here will be limited since the C code only expects DefaultSpec.
Maybe the answer is for Base to provide OtherSpec that has an owner field?
My original attempt when working on the integrated cache started with adding the owner field on the MethodInstance, but that was awkward since we use mi as a "result type" of a method query and at least in the GPUCompiler world we performed first the query independently and then we use it to drive compilation.
Right now I think that owner concept and this mechanism are orthogonal to each other. In particular I am worried that the polymorphism here here will be limited since the C code only expects
DefaultSpec.
The C code is mostly fine. I've been using it extensively with non-DefaultSpec things. There's a few places that still need to be updated, but it largely works fine.
Maybe the answer is for Base to provide
OtherSpecthat has an owner field?
That would be fine.
My original attempt when working on the integrated cache started with adding the owner field on the MethodInstance, but that was awkward since we use mi as a "result type" of a method query and at least in the GPUCompiler world we performed first the query independently and then we use it to drive compilation.
Yeah, it's slightly awkward, but you can still do that and just look through the next field as InternalCodeCache does in this PR.