Decrease `_DEFAULT_SAFE_OPEN_INTERVAL` (to what?)
A possible quick fix to not spend more of our time (at least for now...) on:
- #6544
- #6595
- #6596
While this is quite a minor change, as it has rather far-reaching consequences, we should bikeshed this at least for a week or so :D Any thoughts on the duration, @sphuber, @mbercx, @unkcpz, @t-reents? @giovannipizzi proposed 3s, but I'm worried this is a bit short, so I put 5s for now. Will add some benchmarks here later on.
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:white_check_mark: Project coverage is 77.61%. Comparing base (9256f2f) to head (1020c84).
:warning: Report is 7 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #6599 +/- ##
==========================================
+ Coverage 77.59% 77.61% +0.02%
==========================================
Files 566 566
Lines 43514 43547 +33
==========================================
+ Hits 33760 33793 +33
Misses 9754 9754
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Why change this default? If I understand correctly, the main problem is with the SSH plugin, right? The LocalTransport already changes this default as well, since it doesn't really need one. So wouldn't it make more sense to limit the scope and change the default on SshTransport. That being said, the whole point is that this is a safe default, it can be configured by a user to something lower. So why not just keep that? They can simply run verdi computer configure core.ssh -n --safe-interval 5.
Yes, good point, it should be changed in the SshTransport instead.
I think most users, especially new ones, will just be running with the default value. Especially if they go through verdi computer setup and then get
Report: Note: before the computer can be used, it has to be configured with the command:
Report: verdi -p <profile> computer configure core.ssh <computer>
they'll likely just copy-paste this command. Instead, if they run verdi computer configure core.ssh --help they will probably be overwhelmed with the amount of options and overlook it (or not even read through them at all, due to laziness). Then, running short test jobs like the ArithmeticAdd will take longer than they expect, and they might think they did something wrong during their setup.
The question is, is 30s really necessary? And if it's overkill, why not reduce it.
The question is, is 30s really necessary? And if it's overkill, why not reduce it.
It is put at that value to prevent users being banned from compute centers for opening too many SSH connections. Admittedly, this is more likely to happen when you are running high-throughput with multiple daemon workers, and I am not sure if that is the main use case anymore for AiiDA users. So we can lower it, with the risk that some users may get warnings or banned. If @giovannipizzi also signed off on changing the default to favoring casual users over protecting heavy high-throughput users by default, then it is fine by me as well.
In practice, as also the tests by @khsrali (i think in a different issue?) show, even with a very low value (like 0.1s, that I would not recommend) the number of open connections remains every low. This is because while a connection is not being opened, other tasks pile up and then reuse the same connection (@khsrali can you point to your tests?). For me 5s (or maybe even 3s) is OK, in any case a supercomputer center should not ban you if you connect every 5 seconds every now and then (e.g I could do it myself opening a few ssh shells). If I'm running mid throughput, requests will pile up and reuse the connection. If I'm really running high thoughput, probably I'll need to tweak a few options anyway. So I'd probably just check if 3s is too much once more and then change the default, and update the doc page on how to run high throughput, mentioning what this value is and that it might need to be modified to avoid being banned
The benchmarks by @khsrali can be found here: https://github.com/aiidateam/aiida-core/issues/6544#issuecomment-2407529994
@GeigerJ2 , do you have a moment to apply the changes? Thanks a lot!
Hi @khsrali, thanks for the ping here! Before we actually merge this change, I'd like to do some proper benchmarking (e.g., with Thor). That's why this has been on hold for a while now. Maybe we can do it together after the holidays? I remember you already did some benchmarks in the past 😉
@GeigerJ2 Is doing a benchmark is really necessarily for this? I suggest Just merge it, 5 second is long enough :smiling_face_with_tear:
:smiling_face_with_tear: 🤌