net-imap
net-imap copied to clipboard
✅🚧 Add JRuby to CI
Combined with #528, this replaces #470, and fixes #454.
Hey checking back in after a long time. I'd like to help get this green... have you looked into the failures at all?
@headius Thanks!
I haven't looked into it recently, beyond rebasing the branch and verifying it still had the same issues... the tests that (reliably) fail in CI are all passing locally for me. That certainly slows down my debugging ability.
I don't remember for sure, but I think that some of the other tests that are already omitted or pending in this branch may also pass locally (or maybe they're flaky?). I'd prefer to fix all of them and omit none: omitting every test that actually makes a connection doesn't give me much confidence. But, for the ratchet effect, getting JRuby into CI with a bunch of critical tests omitted is probably better than not testing it at all.
The following is going off of memory the last time I looked into this... and I didn't have time to fully test my hypothesis, so I may be completely wrong, buuuut (caveat's aside)... I think the biggest source of test failures are exceptions from IO#close. My interpretation of the CRuby IO#close code is that I don't think it can raise that exception.
When IO#close is called, rb_io_close_m checks (with the GVL) whether or not it's already closed. And if not, io_close_fptr goes ahead and closes it, without dropping the GVL (as far as I can tell) or adding itself to the list of threads that are blocking on it. At this point, (if it were to release the GVL) if any other thread attempts to close the IO, it would just return immediately without exception. After that, it notifies any other threads that are blocking on that IO to raise that exception.
But I might be misunderstanding what the code does and biased by what I want it to do. 😉 I found some issues in the b.r-l.o tracker that led me to think maybe it didn't or doesn't work the way I assume. So, I looked to see if there was a test or a spec, and there was (and I think it was written by you, so thanks!). But (as far as I can tell) the spec for this doesn't distinguish between an error that's raised by IO#close or an error that's caused by IO#close and raised by another thread that's blocked on IO#close.
it 'raises an IOError with a clear message' do
matching_exception = nil
-> do
IOSpecs::THREAD_CLOSE_RETRIES.times do
read_io, write_io = IO.pipe
going_to_read = false
thread = Thread.new do
begin
going_to_read = true
read_io.read
rescue IOError => ioe
if ioe.message == IOSpecs::THREAD_CLOSE_ERROR_MESSAGE
matching_exception = ioe
end
# try again
end
end
# best attempt to ensure the thread is actually blocked on read
Thread.pass until going_to_read && thread.stop?
sleep(0.001)
read_io.close # <--------------------------- does this raise the exception?
thread.join
write_io.close # <--------------------------- does this raise the exception?
matching_exception&.tap {|ex| raise ex} # <---- or does this raise the exception?
end
end.should raise_error(IOError, IOSpecs::THREAD_CLOSE_ERROR_MESSAGE)
end
My hypothesis is that CRuby's IO#close can't raise this exception, but JRuby's can. I'd like to update the spec to make that distinction. But... I had to switch to other projects before I could test this hypothesis.
With all of that said, even if JRuby's IO#close does behaves differently, it's highly likely that there's still threading bugs in the tests or in the client that need to be fixed. We can still improve the code to not trigger that condition, not just as a workaround but as an improvement that may fix other issues too.
Okay, so looking into it just now, I think my memory or interpretation (or both!) was off by a bit: any blocking threads are notified before the file descriptor is completely cleared. That does introduce the possibility that multiple threads could call IO#close while waiting for the blocking threads to be notified. And, the fiber scheduler also gets to play a role in this, which opens up even more possibilities.
Nevertheless, it still seems to me that, the closer doesn't add itself blocking list for that IO, so it should never be interrupted with that exception. And, after all of the (other) blocking threads have been removed from the blocking list, the closer thread gets to do its thing while keeping the GVL.
Regardless, I'll repeat myself on this:
even if JRuby's IO#close does behaves differently, it's highly likely that there's still threading bugs in the tests or in the client that need to be fixed. We can still improve the code to not trigger that condition, not just as a workaround but as an improvement that may fix other issues too.