nimi-python
nimi-python copied to clipboard
Some system tests take a very long time to run
nimi-bot system tests now run in parallel, with a job for each bitness (win32 vs win64) and for each module (nidmm, niscope, nidcpower, nise, etc.)
I noticed some jobs consistently take a very long time compared to the rest. Take the system tests run as part of #1258:
system-tests/win32/nidcpower Took 2 min 22 sec on nimibot-win-32-2
system-tests/win32/nidigital Took 2 min 5 sec on nimibot-win-32-1
system-tests/win32/nidmm Took 2 min 59 sec on nimibot-win-32-2
system-tests/win32/nifgen Took 3 min 48 sec on nimibot-win-32-2
system-tests/win32/nimodinst Took 1 min 59 sec on nimibot-win-32-1
system-tests/win32/niscope Took 3 min 40 sec on nimibot-win-32-1
system-tests/win32/nise Took 2 min 39 sec on nimibot-win-32-1
system-tests/win32/niswitch Took 8 min 58 sec on nimibot-win-32-2
system-tests/win32/nitclk Took 12 sec on nimibot-win-32-1
system-tests/win64/nidcpower Took 2 min 31 sec on nimibot-win-64-1
system-tests/win64/nidigital Took 3 min 1 sec on nimibot-win-64-1
system-tests/win64/nidmm Took 2 min 47 sec on nimibot-win-64-1
system-tests/win64/nifgen Took 4 min 24 sec on nimibot-win-64-1
system-tests/win64/nimodinst Took 2 min 20 sec on nimibot-win-64-2
system-tests/win64/niscope Took 3 min 37 sec on nimibot-win-64-2
system-tests/win64/nise Took 2 min 49 sec on nimibot-win-64-2
system-tests/win64/niswitch Took 9 min 35 sec on nimibot-win-64-2
system-tests/win64/nitclk Took 14 sec on nimibot-win-64-1
By far the worse offender is niswitch
which takes over twice as long as the rest of them.
We really need to look into what's causing it and fix it.
The most likely reason I see of why niswitch
and nifgen
are pretty slow compared to the rest is that they open simulated sessions to devices whose driver runtime uses the DAQmx framework under the hood.
This is:
- notoriously slow especially on larger systems (5 seconds not extraordinary)
- because of an old bug in the DAQmx framework, must be serialized
System tests for each job runs in a separate agent, but we are running tests for 5 versions of Python in parallel. That would be a lot of multi-second tests gettting serialized!
The mechanism in which tests that use session-based simulation with DAQmx framework devices are serialized is a global lock we call daqmx_sim_db_lock
. Just grep for that keyword.
This is only used in src/nifgen/system_tests/test_system_nifgen.py
and src/niswitch/system_tests/test_system_niswitch.py
.
So a potential way to make these tests faster are:
- Simulate a model that doesn't use DAQmx framework. Those work fast and reliably and that's why we use elsewhere. May not be possible due to driver runtime bug (make sure it's filed and referrenced in test code!) or because the thing being tested only exists on specific models.
- Use persistent simulation for those tests. The drawback is that now there are more requirements on the system running the tests.
I looked at the nimi-bot logs from PR #1268. For example, in job win32/job/niswitch/274/console there are some obvious offenders:
========================== slowest 5 test durations ===========================
223.97s call src/niswitch/system_tests/test_system_niswitch.py::test_continuous_software_scanning
153.20s call src/niswitch/system_tests/test_system_niswitch.py::test_enum_attribute
0.12s setup src/niswitch/system_tests/test_system_niswitch.py::test_relayclose
0.08s call src/niswitch/system_tests/test_system_niswitch.py::test_write_only_attribute
0.07s call src/niswitch/system_tests/test_system_niswitch.py::test_error_message
As expected, those are tests that use session-based simulation of a device that uses the DAQmx driver framework. But they should really not so long. One mistake is that the global lock we introduced is only needed during Session creation and destruction / close. But we hold on to it for the duration of the test.
We should have a goal that no test takes over 1 second.
The intermittent failure due to simulated device creation race conditions is being tracked in the internal NI Bug #AB1235678. Fixing that should also improve the test times of those that use session-based simulation of devices using the DAQmx driver framework.
Following data based on latest runs using new nimibot system (AWS hosted VMs + GitHub Actions). Worst offenders are:
- system_tests_win32 (nidigital) succeeded 22 hours ago in 43m 19s
- system_tests_win32 (nitclk) succeeded 22 hours ago in 38m 9s
This are the ones we should focus on optimizing because they are the long poles at the moment.