matrix-rust-sdk icon indicating copy to clipboard operation
matrix-rust-sdk copied to clipboard

chore(ci): run tarpaulin in release mode

Open bnjbvr opened this issue 1 year ago • 7 comments

This is a small experiment: it turns out compiling is relatively quick, on a PR I just did (3 minutes), but running the tarpaulin build took around 8 minutes. I'm wondering if increasing the compile times and lowering the run time is worth it, hence this experiment.

bnjbvr avatar Oct 16 '24 13:10 bnjbvr

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 78.00%. Comparing base (30f3a3c) to head (3082fc8). Report is 60 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4139      +/-   ##
==========================================
- Coverage   84.72%   78.00%   -6.73%     
==========================================
  Files         269      268       -1     
  Lines       28779    24773    -4006     
==========================================
- Hits        24384    19325    -5059     
- Misses       4395     5448    +1053     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Oct 16 '24 13:10 codecov[bot]

Results of the experiment:

  • baseline:
    • incremental build: ~3min30s
    • run: ~7min
  • compiling with O1:
    • non-incremental build: 15min12s
    • run: ~5min24s
  • compiling with O2:
    • non-incremental build: 29min19s
    • run: ~5min30s

Since the build times are not fair comparisons (this PR could only have non-incremental build times, while for the baseline we get incremental compile times, I'm curious if others would like to enable the O1 level on main for a day or two, compare how it fares in terms of incremental compile times, and revert it if it's definitely not an improvement? cc @poljar

bnjbvr avatar Oct 17 '24 09:10 bnjbvr

Results of the experiment:

* baseline:
  
  * incremental build: ~3min30s
  * run: ~7min

* compiling with O1:
  
  * non-incremental build: 15min12s
  * run: ~5min24s

* compiling with O2:
  
  * non-incremental build: 29min19s
  * run: ~5min30s

Since the build times are not fair comparisons (this PR could only have non-incremental build times, while for the baseline we get incremental compile times, I'm curious if others would like to enable the O1 level on main for a day or two, compare how it fares in terms of incremental compile times, and revert it if it's definitely not an improvement? cc @poljar

Did you measure this locally or on CI? If we enable this on the CI how do we measure and compare this to our previous way, surely not via eyeballing?

poljar avatar Oct 17 '24 10:10 poljar

Eyeballing the wall clock times reported in the log statements 👀 Even if not precise at the very second, these give us the right orders of magnitude IMO. What I'd really want is likely to get an idea of the cost of an incremental build, and that could be done locally if someone were very bored…

bnjbvr avatar Oct 17 '24 12:10 bnjbvr

Ran some things during launch, to have a rough idea of how this would behave on my machine, in terms of compile times at least:

Task Wall-clock time User-time
tarpaulin -O0 initial 116 seconds 15.8 minutes
tarpaulin -O0 rebuild 20.23 seconds 65.85 seconds
tarpaulin -O1 initial 272 seconds 72.44 minutes
tarpaulin -O1 rebuild 73 seconds 15.44 minutes

So, while this doesn't tell us what kind of compile-time slowdown would be caused on CI (especially since my machine has lots of CPU cores, and I suspect CI doesn't), it tells us a rebuild might be between > ~3 and 14~ times slower than it is right now, and I suspect this would cancel the effects of having a faster runtime.

I'd still be inclined to try and merge this in production for a short while (a few days), rather than making back-of-the-envelope calculation like this, if others are interested. Otherwise I think it's ok to close this PR, there are other build-time optimizations to chase.

bnjbvr avatar Oct 22 '24 13:10 bnjbvr

We chatted briefly about this, and we think it's an easy revert in case it's a regression. Let's find out :)

bnjbvr avatar Oct 22 '24 14:10 bnjbvr

A NEW RACE CONDITION APPEARS!

thread 'tests::timeline::test_enabling_backups_retries_decryption' panicked at testing/matrix-sdk-integration-testing/src/tests/timeline.rs:375:10:
We should be able to send a message to our new room: UnknownError("EncryptionNotEnabled")

bnjbvr avatar Oct 22 '24 14:10 bnjbvr

  1. The bug observed in the codecov task was an Heisenbug, since as soon as I added logging for debug, it disappeared.
  2. The regression in terms of coverage is because we start losing track of some lines that are effectively tested. Here's one example: https://app.codecov.io/gh/matrix-org/matrix-rust-sdk/pull/4139/blob/crates/matrix-sdk-base/src/rooms/normal.rs#L1420 It can't be that line 1421 be tested without the other lines now marked as missed being reached (we have tests that will exercise these code paths indirectly, e.g. here).

Because of (2), I think we should close this. It's possible that enabling O1 enables some extra inlining that confuses the codecov instrumentation, thus losing track of locations in the coverage. It's a bit of an unfortunate outcome, but at least we learned a thing :smiling_face_with_tear:

bnjbvr avatar Oct 24 '24 10:10 bnjbvr