router icon indicating copy to clipboard operation
router copied to clipboard

Modify fmt_layer on_event() processing to help macOS

Open garypen opened this issue 1 year ago • 6 comments

Empirically determine that it's not a good idea to try to write more than 1_024 bytes to a non-blocking fd on macOS.

Modify the on_event() processing to take account of this.

garypen avatar Feb 23 '24 14:02 garypen

@garypen, please consider creating a changeset entry in /.changesets/. These instructions describe the process and tooling.

github-actions[bot] avatar Feb 23 '24 14:02 github-actions[bot]

CI performance tests

  • [ ] reload - Reload test over a long period of time at a constant rate of users
  • [ ] events_big_cap_high_rate_callback - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity using callback mode
  • [ ] events_without_dedup_callback - Stress test for events with a lot of users and deduplication DISABLED using callback mode
  • [ ] large-request - Stress test with a 1 MB request payload
  • [x] const - Basic stress test that runs with a constant number of users
  • [ ] no-graphos - Basic stress test, no GraphOS.
  • [ ] step-jemalloc-tuning - Clone of the basic stress test for jemalloc tuning
  • [ ] events - Stress test for events with a lot of users and deduplication ENABLED
  • [ ] events_callback - Stress test for events with a lot of users and deduplication ENABLED in callback mode
  • [ ] events_big_cap_high_rate - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity
  • [ ] events_without_dedup - Stress test for events with a lot of users and deduplication DISABLED
  • [ ] xxlarge-request - Stress test with 100 MB request payload
  • [ ] xlarge-request - Stress test with 10 MB request payload
  • [x] step - Basic stress test that steps up the number of users over time

router-perf[bot] avatar Feb 23 '24 14:02 router-perf[bot]

Hmm, I still get test failures with this, for example with:

cargo run --test telemetry_test bad_query -- --nocapture

but that might be a second issue?

goto-bus-stop avatar Mar 05 '24 12:03 goto-bus-stop

Hmm, I still get test failures with this, for example with:

cargo run --test telemetry_test bad_query -- --nocapture

but that might be a second issue?

You only get the test failure with --nocapture, so this won't generally be picked up in our automated tests.

I ran cargo test --test telemetry_test bad_query -- --nocapture and sure enough it failed. That's because I've only fixed the fmt_layer (i.e.: logging) part of the router IO. The test exercises stderr and breaks as you would expect.

If you make this change:

garypen@Garys-MacBook-Pro router % git diff diff --git a/apollo-router/src/executable.rs b/apollo-router/src/executable.rs index 1be97f684..3e10a4d60 100644 --- a/apollo-router/src/executable.rs +++ b/apollo-router/src/executable.rs @@ -443,7 +443,7 @@ impl Executable { #[cfg(not(target_os = "windows"))] { let _ = set_blocking(libc::STDOUT_FILENO, false);

  •        let _ = set_blocking(libc::STDERR_FILENO, false);
    
  •        // let _ = set_blocking(libc::STDERR_FILENO, false);
       }
    

then the test will pass.

garypen avatar Mar 05 '24 14:03 garypen

Maybe a better approach is to forget about non-blocking IO on macOS as well as Windows?

garypen avatar Mar 05 '24 14:03 garypen

Maybe a better approach is to forget about non-blocking IO on macOS as well as Windows?

I think that makes sense

goto-bus-stop avatar Mar 06 '24 09:03 goto-bus-stop

We reverted the router change so it isn't using non-blocking IO, so we don't need this. If we do have non-blocking IO in the router at some point, it will be more comprehensive than the reverted change, so we won't need this.

garypen avatar Mar 28 '24 08:03 garypen