truffleruby icon indicating copy to clipboard operation
truffleruby copied to clipboard

New test failure in the Zeitwerk test suite

Open fxn opened this issue 2 years ago • 16 comments

Hi!

The suite of Zeitwerk has this test to check autoloading blocks context switching (autoloading is thread-safe):

test "constant definition is synchronized" do
  files = [["m.rb", <<-EOS]]
    module M
      sleep 0.5

      def self.works?
        true
      end
    end
  EOS
  with_setup(files) do
    t = Thread.new { M }
    assert M.works?
    t.join
  end
end

It has passed regularly for a long time, but it started to fail in a flaky way both in truffleruby and in truffleruby-head. See for example this build, or this build.

Does it ring a bell?

fxn avatar Aug 27 '21 19:08 fxn

This is a shell script that essentially does the same as the flaky test:

#!/bin/sh

cat <<EOS > m.rb
module M
  sleep 0.5

  def self.works?
    true
  end
end
EOS

ruby -I. <<EOS
autoload :M, "m"
t = Thread.new { M }
p M.works?
EOS

rm m.rb

If it passes, you see "true" printed. Otherwise, M does not respond to works? (NoMethodError) because there was a context switch before the require triggered by autoload returned.

fxn avatar Aug 27 '21 23:08 fxn

With CRuby, that passes even with sleep 60.

fxn avatar Aug 27 '21 23:08 fxn

Thanks a lot for the report. This is an area where the semantics of CRuby seems unclear, e.g. does it make other threads wait until the end of module M or the end of the file until the constant M is "published"? What happens if the autoloaded file creates threads, do they see M?

In short this is not yet implemented in TruffleRuby (IIRC), because of unclear semantics. Currently there is a lock on who wins to autoload a constant, but currently no lock for "wait some other thread has finished autoloading", since the constant is published as soon as assigned. Preventing thread switching itself seems difficult on JVM, and would feel like a hack at best + potential for deadlocks. I guess we should look at what CRuby does here but the autoload logic in CRuby seems particularly unreadable.

Re Zeitwerk CI, could you exclude this test for TruffleRuby and keep running other tests? If it passed before I would think it was just lucky timings.

eregon avatar Aug 28 '21 14:08 eregon

BTW to improve the reliability of that test I believe you would need a sleep (smaller, e.g. of 0.1) between the Thread.new and assert M.works?, otherwise it's likely the M.works? runs first and then the other thread will see the constant is autoloading and wait, and anyway that thread does not check what is defined on M. For the test to fail it needs to be the Thread.new running first and defining the constant but not yet the method, before the main thread keeps running and call the method.

eregon avatar Aug 28 '21 14:08 eregon

Problem is, this is a fundamental property of Zeitwerk, which is a fundamental property of CRuby: You won't see a class half-defined due to context switching while autoloading. Meaning, what the test does (of course, a class may be reopened elsewhere later, not in a generic sense).

BTW to improve the reliability of that test I believe you would need a sleep (smaller, e.g. of 0.1) between the Thread.new and assert M.works?

You're right.

fxn avatar Aug 28 '21 14:08 fxn

Test revised here: https://github.com/fxn/zeitwerk/commit/016404cc94c9940e31cf057955d7ecdd43f70f54. Thanks for the comment about that :).

fxn avatar Aug 28 '21 15:08 fxn

Hey! I have documented this in the README. I believe it is accurate, but let me know if it should word it in a different way :).

fxn avatar Sep 24 '21 09:09 fxn

That looks accurate, I'll try to fix this issue soon.

eregon avatar Sep 24 '21 10:09 eregon

Some more details about how CRuby handles this: https://github.com/ruby/ruby/pull/5788 And: (from @ioquatix)

No, it's literally cached, there is an autoload cache Instead of setting the constant on the module, it's stashed into an autoload save area once the autoload is done, all the constants are added to the module and all waiting threads are unblocked

eregon avatar May 10 '22 08:05 eregon

I'm slowly figuring out how it works, I guess it's not a bad implementation, but I think variable.c could do with a rewrite. lol. https://github.com/ruby/ruby/pull/5898 fyi

ioquatix avatar May 10 '22 11:05 ioquatix

FWIW, I found https://bugs.ruby-lang.org/issues/15663#change-77214 where I asked some autoload questions and which makes it somewhat clear it's so complex nobody understands it fully. I think it shouldn't be too bad to implement in TruffleRuby because we already have a lock per autoload constant. But potentially deadlocks is an issue to consider, i.e., what if one thread is loading A and that needs B, and another thread is loading B and that needs A? Deadlock or the "isolation" of autoload is broken in that case?

eregon avatar May 10 '22 16:05 eregon

I am not familiar with the implementation, but isn't require already synchronized to avoid seeing partially evaluated files elsewhere? Why aren't these autoloads inheriting or relying on this property?

fxn avatar May 10 '22 16:05 fxn

@fxn require only synchronizes per file, i.e., to guarantee a file is only loaded once (that's easy enough). That doesn't prevent that e.g. class C defined in c.rb is seen partially defined (e.g. some methods defined, some not, only part of c.rb was evaluated) by file use.rb being loaded in another thread and e.g. doing C.new.foo -> potential NoMethodError.

autoload makes it possible to know some of those dependencies, but nested autoloads cannot be known in advance. I think I'll need to write a test script to figure what CRuby does there and if/how it deals with those potential deadlocks.

The general strategy for autoload seems to not publish a constant's value while the file is being loaded (so other threads cannot see it's set and continue executing, they have to block and wait), and instead the autoload logic stashes the value somewhere so only that thread can see it, and then publish the constant at once when the file has been loaded. Not sure about other constants defined in the file, I'd think there is no special behavior there.

eregon avatar May 10 '22 16:05 eregon

@eregon But if user.rb uses C, it should require 'c' before at some point in the code path, no?

fxn avatar May 10 '22 16:05 fxn

Something that could be relevant in @ioquatix foobar.rb example (just a remark, I don't know from the top of my head) is the constant lookup that happens in module Foo; end may trigger the autoload for Foo, depending on load order. Same for the constant lookup that happens in module Bar; end. So, one of them is trying to require a file that is being required by the other one simultaneously.

fxn avatar May 10 '22 16:05 fxn

So, let me repeat the question.

The example sets autoloads for two constants, pointing to the same file. Since that is synchronized, why aren't the autoloads?

I'd expect the second one to trigger a require that blocks, waits, and when it returns, the constant is defined and all continues.

This argument is incorrect, as shown by the tests, but in what sense?

fxn avatar May 10 '22 17:05 fxn