rails
rails copied to clipboard
Fix hanging test due to interlocking(?)
Alright, I'm not going to pretend that I understand exactly what's happening, but the fact is that this seems to fix something, so I'll try to explain what I understand.
The problem
Running the following test hangs forever:
$ bin/test test/engine/commands_test.rb:62
Run options: -v --seed 4349
# Running:
Rails::Engine::CommandsTest#test_server_command_broadcast_logs = 💥 Hangs here
Somehow I seem to have reproduced it with the command that would run on CI:
$ TEST_DIR=engine TESTOPTS="-v -n test_server_command_broadcast_logs" bundle exec rake test
--- test/engine/commands_test.rb
/Users/sto/.rubies/ruby-3.3.5/bin/ruby -w -Itest -Ilib -I../activesupport/lib -I../actionpack/lib -I../actionview/lib -I../activemodel/lib test/engine/commands_test.rb
Run options: -v -n test_server_command_broadcast_logs --seed 59220
# Running:
Rails::Engine::CommandsTest#test_server_command_broadcast_logs = 💥 Hangs here
I'm gonna venture a guess and assume that maybe this test (and 3 others) do not run in CI because of this condition:
https://github.com/rails/rails/blob/aacbb5c0f5bdd11f0dee78da03bd6859f0cabeba/railties/test/engine/commands_test.rb#L36
https://github.com/rails/rails/blob/aacbb5c0f5bdd11f0dee78da03bd6859f0cabeba/railties/test/console_helpers.rb#L23-L25
Is there a way we can check if these tests ever ran on CI?
What seems to be happening
I do not understand much of how PTY works, but what I have identified is that:
- The test hangs at
net.get("/"). Using a bit of debugging, I confirmed that I was not able tocurl http://localhost:3000after spawningrails server. - It appears the process is entering a lock where:
- it will not continue processing and complete the request until
rails server's output has been "consumed" (read,readlines) out of theIOobject returned byPTY.new. - we won't be reading more output out of
primarysincenet.get("/")is blocked
- it will not continue processing and complete the request until
The workaround
I have no idea whether this is an acceptable fix, but parallelizing, and allowing our test thread to try and read (blocking) the PTY's output, while a different thread makes the HTTP request (blocking too), seems to be fixing the problem.
You can try running any of the two commands I shared above and confirm the test will succeed now.
Questions
- Is there a way we can check if these tests ever ran on CI?
- Is my fix an acceptable approach?
- Has this test ever worked in the past? (Locally/On CI)
- Has anything changed (maybe in Puma), making Puma's output blocking, and preventing it to continue processing a request until all its output has been consumed?
Motivation / Background
This Pull Request has been created because [REPLACE ME]
Checklist
Before submitting the PR make sure the following are checked:
- [x] This Pull Request is related to one change. Unrelated changes should be opened in separate PRs.
- [x] Commit message has a detailed description of what changed and why. If this PR fixes a related issue include it in the commit message. Ex:
[Fix #issue-number] - [x] Tests are added or updated if you fix a bug or add a feature.
- [x] CHANGELOG files are updated for the changed libraries if there is a behavior change or additional feature. Minor bug fixes and documentation changes should not be included.
I was under the impression that buildkite/ci was using a tty and therefor these tests were running. @yahonda can probably verify that.
I was under the impression that buildkite/ci was using a tty and therefor these tests were running. @yahonda can probably verify that.
No you're right. I made a guess but I was wrong. For example on the CI build for this branch, I can see the test/engine/commands_test.rb test file has 6 runs, which matches.(here)
Now the question is, why do those tests do not hang on CI? @zzak I'm sure you've you tried running them locally before, do they still run on your machine?
I'm unable to reproduce it locally:
bin/test -v test/engine/commands_test.rb:62
Run options: -v --seed 6467
# Running:
Rails::Engine::CommandsTest#test_server_command_broadcast_logs = 2.14 s = .
Finished in 2.145408s, 0.4661 runs/s, 2.7967 assertions/s.
1 runs, 6 assertions, 0 failures, 0 errors, 0 skips
I wonder if it has something to do with the spawn_command in that file being a unique case, compared to the other tests. That is just a hunch though.
https://github.com/rails/rails/blob/b8720e7cb8c9894d142b8576c21065c92cd1907a/railties/test/isolation/abstract_unit.rb#L345
@zzak I just rebooted my laptop to be sure. Haven't got any Rails app running. The test still hangs.
(macOS Sequoia, Ruby 3.3.5)
While the test is hanging, I can see ruby listening to the port 3000, which disappears as soon as I kill the hanging test:
$ lsof -i -P | grep LISTEN | grep ruby
ruby 4656 sto 7u IPv4 0x3ba087057954dfb8 0t0 TCP localhost:3000 (LISTEN)
ruby 4656 sto 8u IPv6 0xc7ccd4d0b84fd561 0t0 TCP localhost:3000 (LISTEN)
ruby 4656 sto 9u IPv4 0x4593af10ba1eeabc 0t0 TCP localhost:3000 (LISTEN)
ruby 4656 sto 10u IPv4 0x405f2a704fdff54e 0t0 TCP connectivity-check.warp-svc:3000 (LISTEN)
ruby 4656 sto 11u IPv4 0xe41de747218cb552 0t0 TCP connectivity-check.warp-svc:3000 (LISTEN)
Update: tried on a different Mac. Same. Will try on a Linux machine soon.
Update again: took a while to set up and run, but yeah, this test passes on Ubuntu 24.04 with Ruby 3.3.5. Takes something like 13 seconds though.
I'm on linux, but I remember there being some differences in the native threads they used. Worst case, we can try to spike on a macos runner on GH actions to get it to repro.
Yep, seems to be a macOS issue, since I managed to get it to work on Linux too.
@davidstosik - if you add a signal handler that dumps the thread backtraces, it might help track down where the test is hanging.
Something like:
Signal.trap "SIGPROF" do
Thread.list.each do |thread|
puts thread.name;
puts thread.backtrace;
puts
end
end
Then send a PROF signal to the test process when it is hanging.
Thanks @djmb! I added this to the beginning of the hanging test:
Signal.trap "SIGPROF" do
Thread.list.each do |thread|
puts "🌊" * 10
puts thread.name
puts thread.backtrace
puts
end
end
Ran the test until it hands, then executed kill -s SIGPROF {the hanged process pid}. Results are:
Rails::Engine::CommandsTest#test_server_command_broadcast_logs = 🌊🌊🌊🌊🌊🌊🌊🌊🌊🌊
/Users/sto/src/github.com/rails/rails/railties/test/engine/commands_test.rb:67:in `backtrace'
/Users/sto/src/github.com/rails/rails/railties/test/engine/commands_test.rb:67:in `block (2 levels) in test_server_command_broadcast_logs'
/Users/sto/src/github.com/rails/rails/railties/test/engine/commands_test.rb:64:in `each'
/Users/sto/src/github.com/rails/rails/railties/test/engine/commands_test.rb:64:in `block in test_server_command_broadcast_logs'
/Users/sto/.gem/ruby/3.3.1/gems/net-protocol-0.2.2/lib/net/protocol.rb:229:in `wait_readable'
/Users/sto/.gem/ruby/3.3.1/gems/net-protocol-0.2.2/lib/net/protocol.rb:229:in `rbuf_fill'
/Users/sto/.gem/ruby/3.3.1/gems/net-protocol-0.2.2/lib/net/protocol.rb:199:in `readuntil'
/Users/sto/.gem/ruby/3.3.1/gems/net-protocol-0.2.2/lib/net/protocol.rb:209:in `readline'
/opt/rubies/3.3.1/lib/ruby/3.3.0/net/http/response.rb:158:in `read_status_line'
/opt/rubies/3.3.1/lib/ruby/3.3.0/net/http/response.rb:147:in `read_new'
/opt/rubies/3.3.1/lib/ruby/3.3.0/net/http.rb:2342:in `block in transport_request'
/opt/rubies/3.3.1/lib/ruby/3.3.0/net/http.rb:2333:in `catch'
/opt/rubies/3.3.1/lib/ruby/3.3.0/net/http.rb:2333:in `transport_request'
/opt/rubies/3.3.1/lib/ruby/3.3.0/net/http.rb:2306:in `request'
/opt/rubies/3.3.1/lib/ruby/3.3.0/net/http.rb:2299:in `block in request'
/opt/rubies/3.3.1/lib/ruby/3.3.0/net/http.rb:1570:in `start'
/opt/rubies/3.3.1/lib/ruby/3.3.0/net/http.rb:2297:in `request'
/opt/rubies/3.3.1/lib/ruby/3.3.0/net/http.rb:1917:in `get'
/Users/sto/src/github.com/rails/rails/railties/test/engine/commands_test.rb:77:in `block in test_server_command_broadcast_logs'
<internal:kernel>:90:in `tap'
/Users/sto/src/github.com/rails/rails/railties/test/engine/commands_test.rb:76:in `test_server_command_broadcast_logs'
/Users/sto/.gem/ruby/3.3.1/gems/minitest-5.25.1/lib/minitest/test.rb:94:in `block (2 levels) in run'
/Users/sto/.gem/ruby/3.3.1/gems/minitest-5.25.1/lib/minitest/test.rb:190:in `capture_exceptions'
/Users/sto/.gem/ruby/3.3.1/gems/minitest-5.25.1/lib/minitest/test.rb:89:in `block in run'
/Users/sto/.gem/ruby/3.3.1/gems/minitest-5.25.1/lib/minitest.rb:367:in `time_it'
/Users/sto/.gem/ruby/3.3.1/gems/minitest-5.25.1/lib/minitest/test.rb:88:in `run'
/Users/sto/.gem/ruby/3.3.1/gems/minitest-5.25.1/lib/minitest.rb:1207:in `run_one_method'
/Users/sto/.gem/ruby/3.3.1/gems/minitest-5.25.1/lib/minitest.rb:446:in `run_one_method'
/Users/sto/.gem/ruby/3.3.1/gems/minitest-5.25.1/lib/minitest.rb:433:in `block (2 levels) in run'
/Users/sto/.gem/ruby/3.3.1/gems/minitest-5.25.1/lib/minitest.rb:429:in `each'
/Users/sto/.gem/ruby/3.3.1/gems/minitest-5.25.1/lib/minitest.rb:429:in `block in run'
/Users/sto/.gem/ruby/3.3.1/gems/minitest-5.25.1/lib/minitest.rb:471:in `on_signal'
/Users/sto/.gem/ruby/3.3.1/gems/minitest-5.25.1/lib/minitest.rb:458:in `with_info_handler'
/Users/sto/.gem/ruby/3.3.1/gems/minitest-5.25.1/lib/minitest.rb:428:in `run'
/Users/sto/src/github.com/rails/rails/railties/lib/rails/test_unit/line_filtering.rb:10:in `run'
/Users/sto/.gem/ruby/3.3.1/gems/minitest-5.25.1/lib/minitest.rb:331:in `block in __run'
/Users/sto/.gem/ruby/3.3.1/gems/minitest-5.25.1/lib/minitest.rb:331:in `map'
/Users/sto/.gem/ruby/3.3.1/gems/minitest-5.25.1/lib/minitest.rb:331:in `__run'
/Users/sto/.gem/ruby/3.3.1/gems/minitest-5.25.1/lib/minitest.rb:287:in `run'
/Users/sto/.gem/ruby/3.3.1/gems/minitest-5.25.1/lib/minitest.rb:85:in `block in autorun'
🌊🌊🌊🌊🌊🌊🌊🌊🌊🌊
<internal:thread_sync>:18:in `pop'
/Users/sto/.gem/ruby/3.3.1/gems/minitest-5.25.1/lib/minitest/parallel.rb:30:in `block (2 levels) in start'
# (abridging: the backtrace above appears 11 times)
🌊🌊🌊🌊🌊🌊🌊🌊🌊🌊
Timeout stdlib thread
/Users/sto/.gem/ruby/3.3.1/gems/timeout-0.4.1/lib/timeout.rb:110:in `sleep'
/Users/sto/.gem/ruby/3.3.1/gems/timeout-0.4.1/lib/timeout.rb:110:in `wait'
/Users/sto/.gem/ruby/3.3.1/gems/timeout-0.4.1/lib/timeout.rb:110:in `block (2 levels) in create_timeout_thread'
/Users/sto/.gem/ruby/3.3.1/gems/timeout-0.4.1/lib/timeout.rb:108:in `synchronize'
/Users/sto/.gem/ruby/3.3.1/gems/timeout-0.4.1/lib/timeout.rb:108:in `block in create_timeout_thread'
(Most threads don't seem to have a name., but at least I have traces.)
Looking at the first thread's backtrace:
...
/opt/rubies/3.3.1/lib/ruby/3.3.0/net/http/response.rb:158:in `read_status_line'
/opt/rubies/3.3.1/lib/ruby/3.3.0/net/http/response.rb:147:in `read_new'
...
/opt/rubies/3.3.1/lib/ruby/3.3.0/net/http.rb:1570:in `start'
/opt/rubies/3.3.1/lib/ruby/3.3.0/net/http.rb:2297:in `request'
/opt/rubies/3.3.1/lib/ruby/3.3.0/net/http.rb:1917:in `get'
/Users/sto/src/github.com/rails/rails/railties/test/engine/commands_test.rb:77:in `block in test_server_command_broadcast_logs'
...
It does look like the test is stuck on net.get("/").
I wonder if just adding a sleep after spawning the process also solves it for you, like the process may have output "Listening on..." but it's not accepting connections yet causing the request to hang.
@zzak I tried adding sleep 5 after assert_output("Listening on", primary), before Net::HTTP.new, but the test is still hanging.
I'm still not pretending I understand whatever's happening with PTY, but for the record, the change below would work as well.
I'm not suggesting we should use it, but it might help understand where the problem is?
diff --git i/railties/test/console_helpers.rb w/railties/test/console_helpers.rb
index 10d3a54602..b871051aad 100644
--- i/railties/test/console_helpers.rb
+++ w/railties/test/console_helpers.rb
@@ -12,7 +12,7 @@ def assert_output(expected, io, timeout = 10)
output = +""
until output.include?(expected) || Time.now > timeout
if IO.select([io], [], [], 0.1)
- output << io.read(1)
+ output << (io.read(1) || "")
end
end
diff --git i/railties/test/engine/commands_test.rb w/railties/test/engine/commands_test.rb
index 3e888ce0d8..1289759de3 100644
--- i/railties/test/engine/commands_test.rb
+++ w/railties/test/engine/commands_test.rb
@@ -60,9 +60,23 @@ def test_server_command_work_inside_engine
end
def test_server_command_broadcast_logs
primary, replica = PTY.open
- pid = spawn_command("server", replica, env: { "RAILS_ENV" => "development" })
- assert_output("Listening on", primary)
+ out_io = File.open("test_out", "w")
+ pid = spawn_command(
+ "server", replica, env: { "RAILS_ENV" => "development" },
+ out_io: out_io
+ )
+
+ assert_output("Listening on", File.open("test_out", "r"))
Net::HTTP.new("127.0.0.1", 3000).tap do |net|
net.get("/")
@@ -73,7 +87,7 @@ def test_server_command_broadcast_logs
assert_match("Processing by Rails::WelcomeController", logs)
end
- assert_output("Processing by Rails::WelcomeController", primary)
+ assert_output("Processing by Rails::WelcomeController", File.open("test_out", "r"))
ensure
kill(pid)
end
@@ -84,9 +98,10 @@ def plugin_path
"#{@destination_root}/bukkits"
end
- def spawn_command(command, fd, env: {})
+ def spawn_command(command, fd, env: {}, out_io: nil)
+ out_io ||= fd
in_plugin_context(plugin_path) do
- Process.spawn(env, "bin/rails #{command}", in: fd, out: fd, err: fd)
+ Process.spawn(env, "bin/rails #{command}", in: fd, out: out_io, err: fd)
end
end
As you can see, if I spawn the process to write into a file instead of the replica file descriptor returned by PTY.open, and if I read to that file instead of reading from the primary file descriptor, then the test works. (Whereas, I checked, it still does not work for me on macOS.)
I wrote a lot more than what's below, but now I've identified the issue more clearly, what's next should suffice.
If you want to see the process that got me here, you can expand this.
Here's another experiment: on main, apply this diff:
diff --git i/railties/test/engine/commands_test.rb w/railties/test/engine/commands_test.rb
index 3e888ce0d8..a740054139 100644
--- i/railties/test/engine/commands_test.rb
+++ w/railties/test/engine/commands_test.rb
@@ -60,9 +60,18 @@ def test_server_command_work_inside_engine
end
def test_server_command_broadcast_logs
+ in_plugin_context(plugin_path) do
+ puts
+ puts `pwd`.strip
+ puts "Press Enter to continue"
+ $stdin.gets
+ end
+
primary, replica = PTY.open
pid = spawn_command("server", replica, env: { "RAILS_ENV" => "development" })
+
assert_output("Listening on", primary)
+ puts "Press Enter to continue"
Net::HTTP.new("127.0.0.1", 3000).tap do |net|
net.get("/")
It outputs the path to the dummy app used in the test, then pauses the test in two places:
- Before starting the Rails server.
- Before issuing the HTTP request to
/.
This is how it goes:
-
Run the test.
-
At the first pause, I can open a second terminal and run
tail -F {dummy app path}/test/dummy/log/development.logto see what the Rails app outputs in real time. -
The file starts empty.
-
Then I hit Enter in the test process, to proceed to the Rails application start, and stop before making the HTTP request. The log file stays empty.
-
Then I press Enter again, and the
tail -Fprocess watching the log file now outputs this:Started GET "/" for 127.0.0.1 at 2024-10-30 22:55:38 +0900 (0.8ms) CREATE TABLE "schema_migrations" ("version" varchar NOT NULL PRIMARY KEY) /*application='Dummy'*/ (0.1ms) CREATE TABLE "ar_internal_metadata" ("key" varchar NOT NULL PRIMARY KEY, "value" varchar, "created_at" datetime(6) NOT NULL, "updated_at" datetime(6) NOT NULL) /*application='Dummy'*/ ActiveRecord::SchemaMigration Load (0.0ms) SELECT "schema_migrations"."version" FROM "schema_migrations" ORDER BY "schema_migrations"."version" ASC /*application='Dummy'*/ Processing by Rails::WelcomeController#index as */* Rendering /Users/sto/src/github.com/rails/rails/railties/lib/rails/templates/rails/welcome/index.html.erb Rendered /Users/sto/src/github.com/rails/rails/railties/lib/rails/templates/rails/welcome/index.html.erb (Duration: 3.6ms | GC: 3.2ms) -
Then the test process hangs.
-
Finally, I hit Ctrl-C on the test process.
-
Once does nothing, so I do a second time.
-
The test process is killed.
-
At the same time, this line gets added at the bottom of the log file:
Completed 200 OK in 71931ms (Views: 71890.0ms | ActiveRecord: 0.0ms (0 queries, 0 cached) | GC: 3.2ms)
I don't know what it is, but it appears that the Rails server receives the request, renders its response, but is unable to complete?
If, instead of resuming the test execution after the second pause, I browse to http://localhost:3000, I get the same behaviour: Rails renders the page, but does not complete. On the browser side, spinner never stops. As soon as I kill the test, my browser shows the expected Rails Welcome page.
[!IMPORTANT]
TL;DR: macOS's pseudo terminal internal buffer size (that's a mouthful) is only 1kB, and that can't even accommodate the output generated by rails server booting and serving a single request. In other common OSes, for example Debian-based Linux, it'll be way more than enough, like 19kB, 131kB... Since we can't guarantee that buffer size will not decrease in the future, or that Rails server's output will not outgrow that size, I think the fix I originally suggested (use a thread so we can consume the output as it is produced) is justified.
Here's one last experiment.
Just add primary.read(196) before sending the HTTP request:
diff --git c/railties/test/engine/commands_test.rb i/railties/test/engine/commands_test.rb
index 3e888ce0d8..1781650704 100644
--- c/railties/test/engine/commands_test.rb
+++ i/railties/test/engine/commands_test.rb
@@ -64,6 +64,8 @@ def test_server_command_broadcast_logs
pid = spawn_command("server", replica, env: { "RAILS_ENV" => "development" })
assert_output("Listening on", primary)
+ primary.read(196)
+
Net::HTTP.new("127.0.0.1", 3000).tap do |net|
net.get("/")
end
This is another one of those things that are not a proper fix, but help understanding what's happening.
I empties primary buffer from all the stuff Rails outputs at startup, and apparently, that's all Rails needs to be able to continue serving the request and completing the response, making the test pass successfully.
Understanding a bit more about the issue, I wrote this simple Ruby script:
require "pty"
(500..).each do |size|
size = size*10
puts "Spawning a process that outputs #{size} characters."
primary, replica = PTY.open
pid = Process.spawn("echo -n #{'a' * size}", in: replica, out: replica, err: replica)
Process.wait(pid)
sleep 0.1
end
Running this script will keep spawning an echo aaaa command with more as on a new PTY, until it hangs. It hangs at 1030 characters. I dialed it down and identified that spawning a process that echoes 1024 characters works, but spawning a process that echoes 1025 characters will hang!
So PTY has a 1kB buffer and hangs when that buffer is full? Is this a macOS limitation? What would be that number on Linux? Can we change it?
I found this superuser post: Ptys (Pseudo terminals) internal buffer size, which as script that confirms my assumptions:
- On macOS, the pseudo terminal internal buffer size is 1kB (confirmed by running
ptsbufsize.py). - On Debian, it varies depending on the version, but seems to be way higher (19kB, 131kB...)
- Overall, I guess we can't really trust it to be big enough?
Going back to our test, here's what the Rails server outputs:
- When it starts:
That's 533 characters.=> Booting Puma => Rails 8.1.0.alpha application starting in development => Run `bin/rails server --help` for more startup options Puma starting in single mode... * Puma version: 6.4.3 (ruby 3.3.1-p55) ("The Eagle of Durango") * Min threads: 3 * Max threads: 3 * Environment: development * PID: 76984 * Listening on http://127.0.0.1:3000 * Listening on http://[::1]:3000 * Listening on http://127.255.255.0:3000 * Listening on http://127.0.2.2:3000 * Listening on http://127.0.2.3:3000 Use Ctrl-C to stop - When it receives the request to
/:
That's 1023 characters (including color escaping codes). It's a freaky coincidence that that number would be just below the 1kB limit!Started GET "/" for 127.0.0.1 at 2024-10-30 23:58:29 +0900 (0.9ms) CREATE TABLE "schema_migrations" ("version" varchar NOT NULL PRIMARY KEY) /*application='Dummy'*/ (0.1ms) CREATE TABLE "ar_internal_metadata" ("key" varchar NOT NULL PRIMARY KEY, "value" varchar, "created_at" datetime(6) NOT NULL, "updated_at" datetime(6) NOT NULL) /*application='Dummy'*/ ActiveRecord::SchemaMigration Load (0.0ms) SELECT "schema_migrations"."version" FROM "schema_migrations" ORDER BY "schema_migrations"."version" ASC /*application='Dummy'*/ Processing by Rails::WelcomeController#index as */* Rendering /Users/sto/src/github.com/rails/rails/railties/lib/rails/templates/rails/welcome/index.html.erb Rendered /Users/sto/src/github.com/rails/rails/railties/lib/rails/templates/rails/welcome/index.html.erb (Duration: 0.5ms | GC: 0.1ms) Completed 200 OK in 67ms (Views: 2.8ms | ActiveRecord: 0.0ms (0 queries, 0 cached) | GC: 3.7ms)
Calling assert_output "eats" all characters in the buffer until it can find the string it looked for. So assert_output("Listening to") will eat 327 characters, which explains why I can fix the test with primary.read(196)(533-327 = 196).
So what can we do?
- Add
primary.read(196)to consume as much output as possible before we launch the HTTP request. I've confirmed above that it works. - Use an
assert_outputthat will "eat" more of the early output. Ideally, we canassert_output("Use Ctrl-C to stop\r"to consume all the Rails server's boot output.
In my situation however, the size of the Rails server output is so big (it depends on the paths where the partials are), that even not including \r from the assert_output above would make the output bigger than 1024 characters long, hanging the process.
I tried cloning the Rails repository in a shorter path, to see if my long paths were to blame:
gh repo clone rails/rails /tmp/rails
cd /tmp/rails/railties
bin/test -v test/engine/commands_test.rb:62
Unfortunately, the output is still too long without a fix, and the test hangs.
It looks like all the fix ideas I could come up with above are limited, and not guaranteed to continue to work in the future:
- what if the pseudo terminal buffer size is decreased again in the future?
- what if Rails server outputs more stuff to the screen when serving a request?
In conclusion, I think the fix I'm suggesting in this PR is reasonable: run the HTTP request in a separate thread so we can keep consuming the rails server process' output as it is produced.
@davidstosik Thanks for your detailed investigation.
I was able to reproduce the issue using a macos runner on github actions, and indeed your fix to use a thread to make the http request works well: https://github.com/zzak/rails/actions/runs/11639390507
Here is my config for reference:
name: Debug railtie test on macos
on:
push:
workflow_dispatch:
jobs:
test:
runs-on: macos-latest
steps:
- name: checkout
uses: actions/checkout@v3
- name: setup-ruby
uses: ruby/setup-ruby@v1
with:
ruby-version: 3.3
bundler-cache: true
env:
BUNDLE_WITHOUT: "db:job"
- name: run test
env:
BUNDLE_WITHOUT: "db:job"
run: |
cd railties
bin/test test/engine/commands_test.rb:62
It indeed seems like this issue is due to pty, and there are few tests which rely on it, none of them are making a consecutive http request. I was thinking this might affect other tests, or that we should try to make it re-useable in case this comes up again. But after running the majority of the tests on that runner, and checking server_test, console_test, and dbconsole_test, this seems like a rare case and this patch is probably safe to merge.
We can generalize if/when we find other hanging tests later. 👍🏻
Great investigation on this!
I do think we should still go with a thread wrapping the get, even with the pipe: we're in less danger of platform variance etc, but I think your first instinct is still correct, in that it's "wrong" to perform a blocking operation and just hope that there's enough buffer space to handle whatever it produces before we start reading.
@matthewd Thank you for the review!
I checked IO.pipe, and indeed, it also has a buffer size limit, although way bigger than PTY: it's 64kB, which feels a lot more comfortable than PTY's 1kB!
Happy to restore the earlier threaded approach if that still feels better.