truffleruby
truffleruby copied to clipboard
Support AST sharing in Ruby
We need AST sharing in Ruby to be able to be able to use advanced warmup and Native Image features. Most of the text below has been written by @eregon.
Many of the changes we'll need to do will need to be evaluated for performance regressions, so we may need to delay a little until we have a better way to do that at scale.
Can we track progress for that here (edit this comment.)
- [x] Make
nila global singleton (1dc2c2894ca8d88dc4b6220021b7e42b98317bdb) - [x] Make Symbols context-independent so they can be referenced in the AST directly and shared between contexts.
- [x] The first big step is making the Translator not needing a RubyContext.
- [x] Move as much as possible immutable and context-independent state from
RubyContexttoRubyLanguage(possibly in a separateRubyEngineclass) - [x] Replace the
RubyContextfield fromRubyBaseNodewithContextReference - [ ] Figure out how to make most inline caches work when there are multiple contexts sharing the same AST.
- [x] For instance, we should use the same Shape in different RubyContext for objects of the same class, so the inline caches based on Shape can actually be reused between multiple RubyContexts of the same Engine.
- [ ] The final goal is to not deopt while loading Ruby programs. To do that, we need to not invalidate any
Assumptionin multi-context mode during startup, notably while defining methods.
Optimization:
- [x] Make
nila global immutable singletonNil.INSTANCElikeNotProvided.INSTANCEso it can be shared between contexts and is context-independent. But many nodes assume they only see primitives and DynamicObject (and nothing else). - [x] Think about making
Bignumscontext-independent (otherwise we have to store Bignum literals as a constant index in the AST, and lookup that index in each RubyContext). Maybe we can supportobject_idon them via System identityHashCode(). I think we need to design a scheme forobject_idfor immutable instances shared between contexts, distinct from the usual per-context object_id's. - [x] Same for
Symbolliterals. Symbol are immutable so they are context-independent. Butobject_idis still challenging. - [x] Same for frozen
Stringliterals potentially, to avoid one copy per context. Non-frozen String are context-dependent though.
Inline cache for calling methods
-
[x] We probably want to migrate the dispatch nodes to DSL nodes before doing changes there.
-
First idea: changing the way we point to the class from a Ruby object:
Current: DynamicObject -> RubyClass
Proposed: DynamicObject -> classShapeReference -> classShape (used for inline cache)
The classShape is an immutable map of methods names to CallTarget. We'd keep the current model when not using multiple contexts.
Instance variables (@ivars) and methods will likely become separated, which seems fine.
So a Ruby object would use the DynamicObject's Shape for instance variables, and another field for the classShapeReference. classShapeReference might be simply the context-specific class object.
How should we deal with methods defined in superclasses? We could walk the hierarchy chain and inline cache on every classShape found (I think JS does that), but the cost of that might be high.
We could also try to flatten and have all available methods in classShape but that would mean many updates of classShapeReferences when defining a method e.g. in Object or Kernel and threaten stability for inline caches.
- Another idea: we could have very-fine grained Assumptions, 1 Assumption per (method name, class). That way, we could add methods to classes, and only invalidate that (method name, class) Assumption when defining, which hopefully no call site depends on, because that call site would have called a different method (if there is one at all) before that new method got defined. => the class/module wouldn't need to be cached on, only the methods would be immutable/shared. A class might see many sets of methods over time, so caching on the class seems likely to blow the inline cache quickly (e.g. there is a str.foo call site and then later str.bar is defined, if we cache on the class it becomes bimorphic for no reason, +1 morphic for each new method added with a call in between). We might still need to associate classes between contexts so to get those class+methodName Assumptions and the "no method with that name in this class" Assumptions.
class Foo < Object; end
obj = Foo.new
obj.object_id # a call site, which will checks these assumptions:
# (Foo, "object_id")
# (Object, "object_id")
# (Kernel, "object_id") => found the method
class Foo
def bar; end
end
# would not invalidate the call site above, just the (Foo, "bar") assumption
So we'd create the assumptions lazily and based on each call site lookup through all classes until a method is found (we already do something similar, except assumptions are just per-class).
It sounds promising, but I'm not sure how it'd fare in practice. And of course the memory usage would go up a bit.
-
Another way to look at it is in Ruby all methods are in the prototype, not directly on the object, and we can read methods out of the class (stored in a DynamicObject). Still that Shape check would be invalidated on every method added to that class.
-
Yet another idea is to replace module Assumptions with a (volatile) field to check the "module version" much like MRI.
Is there a tool to help you know if you are capturing your context in your AST that we could start running in CI to see how far off we are?
The first big step is making the Translator not needing a RubyContext.
The Translator uses nil for the FrameDescriptor default value though.
So probably we should start by making nil a global immutable singleton Nil.INSTANCE like NotProvided.INSTANCE so it can be shared between contexts and is context-independent. But many nodes assume they only see primitives and DynamicObject (and nothing else), so that might be a bit tricky to get rght.
@bjfish is going to work on making nil a global immutable singleton Nil.INSTANCE.
nil has become a global singletion Nil.INSTANCE in 1dc2c2894ca8d88dc4b6220021b7e42b98317bdb.
Thanks for this work!
The next "small" step is to make Symbols context-independent so they can still be referenced directly in the shared AST. That's fine as Symbols are immutable.
We should store the SymbolTable instance as a field of RubyLanguage, not as a static final field as that could be problematic for persistent warmup.
There was a lot of progress here, mostly by @bjfish. The Translator no longer needs a context, and the AST no longer stores the RubyContext now. The biggest part remaining is how to do method calls and constant reads inline cache nicely (i.e., not caching on context-specific objects) with a shared AST.
Link: context-independent method lookup is being implemented in https://github.com/oracle/truffleruby/pull/2676