synapse icon indicating copy to clipboard operation
synapse copied to clipboard

Increase timeout for test_lock_contention on RISC-V

Open Gui-Yue opened this issue 6 months ago • 1 comments

This PR addresses a test failure for tests.handlers.test_worker_lock.WorkerLockTestCase.test_lock_contention which consistently times out on the RISC-V (specifically riscv64) architecture.

The test simulates high lock contention and has a default timeout of 5 seconds, which seems sufficient for architectures like x86_64 but proves too short for current RISC-V hardware/environment performance characteristics, leading to spurious tests.utils.TestTimeout failures.

This fix introduces architecture detection using platform.machine(). If a RISC-V architecture is detected:

  • The timeout for this specific test is increased (e.g., to 15 seconds ).

The original, stricter timeout (5 seconds) and lock count (500) are maintained for all other architectures to avoid masking potential performance regressions elsewhere.

This change has been tested locally on RISC-V, where the test now passes reliably, and on x86_64, where it continues to pass with the original constraints.


Pull Request Checklist

  • [X] Pull request is based on the develop branch (Assuming you based it correctly)
  • [X] Pull request includes a changelog file. (See below)
  • [X] Code style is correct (run the linters) (Please run linters locally)

Gui-Yue avatar May 12 '25 14:05 Gui-Yue

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar May 12 '25 14:05 CLAassistant

I noticed that lint test failed. The log shows that "tests/handlers/test_worker_lock.py:22:1: I001 Import block is un-sorted or un-formatted".Could you tell me what rules are followed in the order of python package imports? @devonh

Gui-Yue avatar May 22 '25 14:05 Gui-Yue

I noticed that lint test failed. The log shows that "tests/handlers/test_worker_lock.py:22:1: I001 Import block is un-sorted or un-formatted".Could you tell me what rules are followed in the order of python package imports? @devonh

Hey! The best way to sort out these issues would be to run the lints yourself locally. This is described here: https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters But basically just boils down to:

poetry run ./scripts-dev/lint.sh

Running that should catch all the same things that CI is looking at.

devonh avatar May 22 '25 14:05 devonh

For a little context: this happens because the timeout is based on wall clock time. This will be flakey if run on lower end systems or systems with high load. I was able to pretty reliably simulate this on a x86_64 with CPU 8/16 when ran with -j48. ~~I ended up bumping it up to 30 seconds, myself.~~ Scratch that, I settled on 15 seconds.

realtyem avatar May 22 '25 16:05 realtyem

could you please retrigger the tests? by the way, this issue was catched by debian, full log can be found in https://buildd.debian.org/status/fetch.php?pkg=matrix-synapse&arch=riscv64&ver=1.128.0-1&stamp=1746950558&raw=0. @devonh

Gui-Yue avatar May 26 '25 00:05 Gui-Yue

I think it's unhappy with the newline at the end of the newsfile now. It's not the most robust tool. :upside_down_face:

devonh avatar May 26 '25 14:05 devonh

I think it's unhappy with the newline at the end of the newsfile now. It's not the most robust tool. 🙃

Because the file end with "space",I had not noticed.Now it has been fixed.Please retrigger the test. @devonh

Gui-Yue avatar May 27 '25 03:05 Gui-Yue