truffleruby Support AST sharing in Ruby

We need AST sharing in Ruby to be able to be able to use advanced warmup and Native Image features. Most of the text below has been written by @eregon.

Many of the changes we'll need to do will need to be evaluated for performance regressions, so we may need to delay a little until we have a better way to do that at scale.

Can we track progress for that here (edit this comment.)

[x] Make nil a global singleton (1dc2c2894ca8d88dc4b6220021b7e42b98317bdb)
[x] Make Symbols context-independent so they can be referenced in the AST directly and shared between contexts.
[x] The first big step is making the Translator not needing a RubyContext.
[x] Move as much as possible immutable and context-independent state from RubyContext to RubyLanguage (possibly in a separate RubyEngine class)
[x] Replace the RubyContext field from RubyBaseNode with ContextReference
[ ] Figure out how to make most inline caches work when there are multiple contexts sharing the same AST.
[x] For instance, we should use the same Shape in different RubyContext for objects of the same class, so the inline caches based on Shape can actually be reused between multiple RubyContexts of the same Engine.
[ ] The final goal is to not deopt while loading Ruby programs. To do that, we need to not invalidate any Assumption in multi-context mode during startup, notably while defining methods.

Optimization:

[x] Make nil a global immutable singleton Nil.INSTANCE like NotProvided.INSTANCE so it can be shared between contexts and is context-independent. But many nodes assume they only see primitives and DynamicObject (and nothing else).
[x] Think about making Bignums context-independent (otherwise we have to store Bignum literals as a constant index in the AST, and lookup that index in each RubyContext). Maybe we can support object_id on them via System identityHashCode(). I think we need to design a scheme for object_id for immutable instances shared between contexts, distinct from the usual per-context object_id's.
[x] Same for Symbol literals. Symbol are immutable so they are context-independent. But object_id is still challenging.
[x] Same for frozen String literals potentially, to avoid one copy per context. Non-frozen String are context-dependent though.

Inline cache for calling methods

[x] We probably want to migrate the dispatch nodes to DSL nodes before doing changes there.
First idea: changing the way we point to the class from a Ruby object:

Current: DynamicObject -> RubyClass

Proposed: DynamicObject -> classShapeReference -> classShape (used for inline cache)

The classShape is an immutable map of methods names to CallTarget. We'd keep the current model when not using multiple contexts.

Instance variables (@ivars) and methods will likely become separated, which seems fine. So a Ruby object would use the DynamicObject's Shape for instance variables, and another field for the classShapeReference. classShapeReference might be simply the context-specific class object.

How should we deal with methods defined in superclasses? We could walk the hierarchy chain and inline cache on every classShape found (I think JS does that), but the cost of that might be high. We could also try to flatten and have all available methods in classShape but that would mean many updates of classShapeReferences when defining a method e.g. in Object or Kernel and threaten stability for inline caches.

Another idea: we could have very-fine grained Assumptions, 1 Assumption per (method name, class). That way, we could add methods to classes, and only invalidate that (method name, class) Assumption when defining, which hopefully no call site depends on, because that call site would have called a different method (if there is one at all) before that new method got defined. => the class/module wouldn't need to be cached on, only the methods would be immutable/shared. A class might see many sets of methods over time, so caching on the class seems likely to blow the inline cache quickly (e.g. there is a str.foo call site and then later str.bar is defined, if we cache on the class it becomes bimorphic for no reason, +1 morphic for each new method added with a call in between). We might still need to associate classes between contexts so to get those class+methodName Assumptions and the "no method with that name in this class" Assumptions.

class Foo < Object; end

obj = Foo.new
obj.object_id # a call site, which will checks these assumptions:
# (Foo, "object_id")
# (Object, "object_id")
# (Kernel, "object_id") => found the method

class Foo
  def bar; end
end
# would not invalidate the call site above, just the (Foo, "bar") assumption

So we'd create the assumptions lazily and based on each call site lookup through all classes until a method is found (we already do something similar, except assumptions are just per-class).

It sounds promising, but I'm not sure how it'd fare in practice. And of course the memory usage would go up a bit.

Another way to look at it is in Ruby all methods are in the prototype, not directly on the object, and we can read methods out of the class (stored in a DynamicObject). Still that Shape check would be invalidated on every method added to that class.
Yet another idea is to replace module Assumptions with a (volatile) field to check the "module version" much like MRI.

Nov 28 '19 14:11 chrisseaton

Is there a tool to help you know if you are capturing your context in your AST that we could start running in CI to see how far off we are?

Nov 28 '19 14:11 chrisseaton

The first big step is making the Translator not needing a RubyContext.

The Translator uses nil for the FrameDescriptor default value though. So probably we should start by making nil a global immutable singleton Nil.INSTANCE like NotProvided.INSTANCE so it can be shared between contexts and is context-independent. But many nodes assume they only see primitives and DynamicObject (and nothing else), so that might be a bit tricky to get rght.

Nov 28 '19 16:11 eregon

@bjfish is going to work on making nil a global immutable singleton Nil.INSTANCE.

Feb 18 '20 11:02 eregon

nil has become a global singletion Nil.INSTANCE in 1dc2c2894ca8d88dc4b6220021b7e42b98317bdb.

Feb 20 '20 14:02 eregon

Thanks for this work!

Feb 20 '20 20:02 chrisseaton

The next "small" step is to make Symbols context-independent so they can still be referenced directly in the shared AST. That's fine as Symbols are immutable. We should store the SymbolTable instance as a field of RubyLanguage, not as a static final field as that could be problematic for persistent warmup.

Apr 28 '20 14:04 eregon

There was a lot of progress here, mostly by @bjfish. The Translator no longer needs a context, and the AST no longer stores the RubyContext now. The biggest part remaining is how to do method calls and constant reads inline cache nicely (i.e., not caching on context-specific objects) with a shared AST.

Nov 10 '20 16:11 eregon

Link: context-independent method lookup is being implemented in https://github.com/oracle/truffleruby/pull/2676

Jun 10 '22 12:06 eregon

truffleruby truffleruby copied to clipboard

Support AST sharing in Ruby

Inline cache for calling methods

truffleruby
truffleruby copied to clipboard