webmachine-ruby icon indicating copy to clipboard operation
webmachine-ruby copied to clipboard

The consistently failing spec

Open ghost opened this issue 11 years ago • 64 comments

This has been mentioned in the other PR, and I remember a couple of random build failures in the past. Can someone point me to a failed build or provide intel? Let's try to come up with a fix. :zap:

ghost avatar Oct 21 '13 22:10 ghost

Pretty sure this is a cascading failure in the adapter specs. Seems the most common culprit is the Reel adapter spec. If it fails, the next adapter test fails too. I struggled with the flakiness of these tests for a long time. They pass consistently on my machine every time, but not so much on Travis.

samwgoldman avatar Oct 22 '13 01:10 samwgoldman

Can you point me at one of the Reel failures?

tarcieri avatar Oct 22 '13 01:10 tarcieri

Here's a JRuby test run where the Reel adapter fails. Later, the Rack adapter fails.

https://travis-ci.org/seancribbs/webmachine-ruby/jobs/12748305

samwgoldman avatar Oct 22 '13 02:10 samwgoldman

I'm able to unreliably reproduce this with bundle exec rspec --order rand:53167 spec/webmachine/.

ghost avatar Oct 23 '13 19:10 ghost

That's the spec seed of https://travis-ci.org/seancribbs/webmachine-ruby/jobs/12748303

ghost avatar Oct 23 '13 19:10 ghost

I'm not sure we should bother fixing the root cause of this leak. We could as well just use a fresh random port for each example, and let garbage collection take care of the rest. And when the process exits, they'll get released anyway.

Example: https://github.com/lgierth/aeee/blob/master/spec/support/helper.rb#L15-L20

ghost avatar Oct 23 '13 21:10 ghost

Yeah, I previously had the adapter tests do just that (89ce86e568eb368b5c05f84f2dd1f97d45b7a26d). I removed that because I thought I had gotten the servers to play nice and shut down.

Seems like web servers generally expect to be at the center of the universe. Maybe the Adapter#shutdown API is just a pipe dream.

samwgoldman avatar Oct 23 '13 22:10 samwgoldman

(PR reference was a typo)

nyarly avatar Oct 24 '13 20:10 nyarly

One of today's builds shows a new error from Reel on rbx-19mode: https://travis-ci.org/seancribbs/webmachine-ruby/jobs/14224187#L1622

Could it be this is the actual error? Maybe the error gets swallowed most of the time due to something. It also always seem to be the same three examples that fail, except for the mentioned JRuby failure - where it's all of the examples in that example group.

On the other hand, the error could as well just be a different symptom of something network-related.

cc @tarcieri

ghost avatar Nov 19 '13 23:11 ghost

@lgierth hmmm...

ArgumentError: Data object has already been freed

This could either be an http_parser.rb bug on a Reel bug. Not sure? /cc @tmm1

tarcieri avatar Nov 20 '13 00:11 tarcieri

I just experienced another symptom, the first example hangs indefinitely, and isn't even affected by timeout(5) in #reel_server: https://gist.github.com/lgierth/7554999

Note how i send SIGINT in line 6, and the difference in time between the former and latter log timestamps.

ghost avatar Nov 20 '13 00:11 ghost

Also note the spec command which uses --order rand:53167, as an earlier CI failure did (see my first comment after opening this issue), but runs only the Reel spec.

ghost avatar Nov 20 '13 00:11 ghost

the first example hangs indefinitely

And obviously also the second example, as I sent SIGINT again in line 7...

ghost avatar Nov 20 '13 00:11 ghost

This is possibly inside of Celluloid's at_exit handler /cc @halorgium

tarcieri avatar Nov 20 '13 00:11 tarcieri

here's an example of random Reel errors I see often: https://travis-ci.org/seancribbs/webmachine-ruby/jobs/14209468 it happened on rbx this time.

ghost avatar Nov 20 '13 02:11 ghost

@robgleeson that's the same error:

ArgumentError: Data object has already been freed

Again this is either a bug in:

  1. rbx
  2. http_parser.rb

tarcieri avatar Nov 20 '13 02:11 tarcieri

I would guess http_parser.rb if the same fail occurs on jruby.

ghost avatar Nov 20 '13 02:11 ghost

If the problem is Travis running multiple adapters in one build, and granted that that's not a real use case (?), could the adapter be parameterized to an env var and the build split into per-adapter?

nyarly avatar Nov 25 '13 18:11 nyarly

@samwgoldman regarding shutdown vs. random port assignment, I'm having trouble with shutdown and Rack+Mongrel. @server.server.shutdown seems to work only with WEBrick.

On a sidenote, I also noticed that only the WEBrick and Hatetepe adapters register shutdown as a SIGINT handler. FWIW, Rack::Handler#start traps SIGINT as well, and it simply exits the process unless the adapter implements shutdown.

ghost avatar Dec 03 '13 00:12 ghost

@lgierth Confirm that the shutdown implementation on the Rack adapter only works with Rack+WEBrick. This is definitely a problem. From the land of wishes, I really wish Ruby web servers provided usable shutdown methods. Until they do, it would seem that Adapter#shutdown is not a realistic API to maintain.

I guess that leaves us with two options: 1) random port assignment for adapter tests in a single test process and 2) separate test processes for each adapter. I'll also note that adapters can now be extracted into gems which depend on the exported RSpec shared examples.

I somewhat snuck the Adapter#shutdown method into the public API, so its removal might require a greater-than-patch-level version bump.

samwgoldman avatar Dec 03 '13 04:12 samwgoldman

I somewhat snuck the Adapter#shutdown method into the public API, so its removal might require a greater-than-patch-level version bump.

From what I understand, adding support for PUT will require a major bump as well, so maybe we could combine the two.

ghost avatar Dec 03 '13 13:12 ghost

more fails that look random/buggy in nature: https://travis-ci.org/seancribbs/webmachine-ruby/jobs/14867321 (rbx) https://travis-ci.org/seancribbs/webmachine-ruby/jobs/14867321 (jruby)

ghost avatar Dec 03 '13 15:12 ghost

@robgleeson the Reel adapter is crashing on rbx. Not sure what's at fault: rbx or http_parser.rb. See:

ArgumentError: Data object has already been freed

Maybe @brixen or @dbussink can help? (or @tmm1 on the http_parser.rb side perhaps)

tarcieri avatar Dec 03 '13 17:12 tarcieri

and my comments mysteriously disappear, anyway the failures I linked to are not related to that error. it looks like reel can't acquire a port in one of its specs.

ghost avatar Dec 03 '13 18:12 ghost

@robgleeson in those tests I'm seeing Reel crash due to that error, and a connection refused error because the server isn't running... unless there's a different test failure I'm not seeing

tarcieri avatar Dec 03 '13 18:12 tarcieri

ah youre right, i didn't see that happen earlier in the trace. im not sure where the other failures are coming from(jruby/MRI). they look different and not related.

ghost avatar Dec 03 '13 18:12 ghost

jruby on travis refers to a 1.9 implementation right? I think we can get rid of the RUBY_VERSION guards from the gemfile if it is. i dont think 1.8 is being tested against anymore.

ghost avatar Dec 03 '13 18:12 ghost

build passed this time: https://travis-ci.org/seancribbs/webmachine-ruby/builds/14881037 dropped RUBY_VERSION guard.

ghost avatar Dec 03 '13 19:12 ghost

@robgleeson Travis switched the default for jruby (some time back) and rbx to 1.9 recently, hence #138. So yeah we must have tested against jruby-18mode until some point.

Officially dropping 1.8 support makes one more reason for a major release :)

There are a couple of other RUBY_VERSION branches in the specs, we should remove these as well then.

ghost avatar Dec 03 '13 22:12 ghost

Officially, it's already unsupported. :)

Sean Cribbs

On Dec 3, 2013, at 4:28 PM, Lars Gierth [email protected] wrote:

@robgleeson Travis switched the default for jruby (some time back) and rbx to 1.9 recently, hence #138. So yeah we must have tested against jruby-18mode until some point.

Officially dropping 1.8 support makes one more reason for a major release :)

There are a couple of other RUBY_VERSION branches in the specs, we should remove these as well then.

— Reply to this email directly or view it on GitHub.

seancribbs avatar Dec 03 '13 23:12 seancribbs