matrix-rust-sdk chore(ci): run tarpaulin in release mode

This is a small experiment: it turns out compiling is relatively quick, on a PR I just did (3 minutes), but running the tarpaulin build took around 8 minutes. I'm wondering if increasing the compile times and lowering the run time is worth it, hence this experiment.

Oct 16 '24 13:10 bnjbvr

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 78.00%. Comparing base (30f3a3c) to head (3082fc8). Report is 60 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4139      +/-   ##
==========================================
- Coverage   84.72%   78.00%   -6.73%     
==========================================
  Files         269      268       -1     
  Lines       28779    24773    -4006     
==========================================
- Hits        24384    19325    -5059     
- Misses       4395     5448    +1053

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Oct 16 '24 13:10 codecov[bot]

Results of the experiment:

baseline:
- incremental build: ~3min30s
- run: ~7min
compiling with O1:
- non-incremental build: 15min12s
- run: ~5min24s
compiling with O2:
- non-incremental build: 29min19s
- run: ~5min30s

Since the build times are not fair comparisons (this PR could only have non-incremental build times, while for the baseline we get incremental compile times, I'm curious if others would like to enable the O1 level on main for a day or two, compare how it fares in terms of incremental compile times, and revert it if it's definitely not an improvement? cc @poljar

Oct 17 '24 09:10 bnjbvr

Results of the experiment:
* baseline:
  
  * incremental build: ~3min30s
  * run: ~7min

* compiling with O1:
  
  * non-incremental build: 15min12s
  * run: ~5min24s

* compiling with O2:
  
  * non-incremental build: 29min19s
  * run: ~5min30s
Since the build times are not fair comparisons (this PR could only have non-incremental build times, while for the baseline we get incremental compile times, I'm curious if others would like to enable the O1 level on main for a day or two, compare how it fares in terms of incremental compile times, and revert it if it's definitely not an improvement? cc @poljar

Did you measure this locally or on CI? If we enable this on the CI how do we measure and compare this to our previous way, surely not via eyeballing?

Oct 17 '24 10:10 poljar

Eyeballing the wall clock times reported in the log statements 👀 Even if not precise at the very second, these give us the right orders of magnitude IMO. What I'd really want is likely to get an idea of the cost of an incremental build, and that could be done locally if someone were very bored…

Oct 17 '24 12:10 bnjbvr

Ran some things during launch, to have a rough idea of how this would behave on my machine, in terms of compile times at least:

Task	Wall-clock time	User-time
tarpaulin -O0 initial	116 seconds	15.8 minutes
tarpaulin -O0 rebuild	20.23 seconds	65.85 seconds
tarpaulin -O1 initial	272 seconds	72.44 minutes
tarpaulin -O1 rebuild	73 seconds	15.44 minutes

So, while this doesn't tell us what kind of compile-time slowdown would be caused on CI (especially since my machine has lots of CPU cores, and I suspect CI doesn't), it tells us a rebuild might be between > ~3 and 14~ times slower than it is right now, and I suspect this would cancel the effects of having a faster runtime.

I'd still be inclined to try and merge this in production for a short while (a few days), rather than making back-of-the-envelope calculation like this, if others are interested. Otherwise I think it's ok to close this PR, there are other build-time optimizations to chase.

Oct 22 '24 13:10 bnjbvr

We chatted briefly about this, and we think it's an easy revert in case it's a regression. Let's find out :)

Oct 22 '24 14:10 bnjbvr

A NEW RACE CONDITION APPEARS!

thread 'tests::timeline::test_enabling_backups_retries_decryption' panicked at testing/matrix-sdk-integration-testing/src/tests/timeline.rs:375:10:
We should be able to send a message to our new room: UnknownError("EncryptionNotEnabled")

Oct 22 '24 14:10 bnjbvr

The bug observed in the codecov task was an Heisenbug, since as soon as I added logging for debug, it disappeared.
The regression in terms of coverage is because we start losing track of some lines that are effectively tested. Here's one example: https://app.codecov.io/gh/matrix-org/matrix-rust-sdk/pull/4139/blob/crates/matrix-sdk-base/src/rooms/normal.rs#L1420 It can't be that line 1421 be tested without the other lines now marked as missed being reached (we have tests that will exercise these code paths indirectly, e.g. here).

Because of (2), I think we should close this. It's possible that enabling O1 enables some extra inlining that confuses the codecov instrumentation, thus losing track of locations in the coverage. It's a bit of an unfortunate outcome, but at least we learned a thing :smiling_face_with_tear:

Oct 24 '24 10:10 bnjbvr