router
router copied to clipboard
Modify fmt_layer on_event() processing to help macOS
Empirically determine that it's not a good idea to try to write more than 1_024 bytes to a non-blocking fd on macOS.
Modify the on_event() processing to take account of this.
@garypen, please consider creating a changeset entry in /.changesets/. These instructions describe the process and tooling.
CI performance tests
- [ ] reload - Reload test over a long period of time at a constant rate of users
- [ ] events_big_cap_high_rate_callback - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity using callback mode
- [ ] events_without_dedup_callback - Stress test for events with a lot of users and deduplication DISABLED using callback mode
- [ ] large-request - Stress test with a 1 MB request payload
- [x] const - Basic stress test that runs with a constant number of users
- [ ] no-graphos - Basic stress test, no GraphOS.
- [ ] step-jemalloc-tuning - Clone of the basic stress test for jemalloc tuning
- [ ] events - Stress test for events with a lot of users and deduplication ENABLED
- [ ] events_callback - Stress test for events with a lot of users and deduplication ENABLED in callback mode
- [ ] events_big_cap_high_rate - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity
- [ ] events_without_dedup - Stress test for events with a lot of users and deduplication DISABLED
- [ ] xxlarge-request - Stress test with 100 MB request payload
- [ ] xlarge-request - Stress test with 10 MB request payload
- [x] step - Basic stress test that steps up the number of users over time
Hmm, I still get test failures with this, for example with:
cargo run --test telemetry_test bad_query -- --nocapture
but that might be a second issue?
Hmm, I still get test failures with this, for example with:
cargo run --test telemetry_test bad_query -- --nocapturebut that might be a second issue?
You only get the test failure with --nocapture, so this won't generally be picked up in our automated tests.
I ran cargo test --test telemetry_test bad_query -- --nocapture and sure enough it failed. That's because I've only fixed the fmt_layer (i.e.: logging) part of the router IO. The test exercises stderr and breaks as you would expect.
If you make this change:
garypen@Garys-MacBook-Pro router % git diff diff --git a/apollo-router/src/executable.rs b/apollo-router/src/executable.rs index 1be97f684..3e10a4d60 100644 --- a/apollo-router/src/executable.rs +++ b/apollo-router/src/executable.rs @@ -443,7 +443,7 @@ impl Executable { #[cfg(not(target_os = "windows"))] { let _ = set_blocking(libc::STDOUT_FILENO, false);
-
let _ = set_blocking(libc::STDERR_FILENO, false);
-
// let _ = set_blocking(libc::STDERR_FILENO, false); }
then the test will pass.
Maybe a better approach is to forget about non-blocking IO on macOS as well as Windows?
Maybe a better approach is to forget about non-blocking IO on macOS as well as Windows?
I think that makes sense
We reverted the router change so it isn't using non-blocking IO, so we don't need this. If we do have non-blocking IO in the router at some point, it will be more comprehensive than the reverted change, so we won't need this.