fix multi-threading issue
Proposed change
Replaces busy-wait loops with event-driven dependency resolution and adds cross-process thread sharing for better resource utilization.
Changes and approach I took
- Eliminated busy-wait loops that polled database every 100ms
- Fixed subprocesses dying instead of helping other processes
- Tasks queue when dependencies unavailable, execute automatically when ready
- Idle processes help busy ones through shared task queue
Closes issue : #595
Files changed
nettacker/core/queue_manager.py (new)
nettacker/core/lib/base.py
nettacker/core/app.py
nettacker/core/module.py
Type of change
- [x] Bugfix (non-breaking change which fixes an issue)
Checklist
- [x] I've followed the [contributing guidelines][contributing-guidelines]
- [x] I've run
make pre-commit, it didn't generate any changes - [x] I've run
make test, all tests passed locally
Summary by CodeRabbit
- New Features
- Smarter handling of step dependencies to auto-queue and resume tasks when prerequisites are ready.
- Performance
- Introduces a shared, multi-process thread pool for more efficient parallel scans.
- Event-driven queue reduces busy-waiting and CPU usage during scans.
- Reliability
- Improved retry and backoff logic to reduce timeouts and flaky dependency waits.
- More consistent execution order for independent, generator, and consumer steps.
- Logging
- Clearer verbose logs on dependency availability, step ordering, and task execution status.
Walkthrough
Adds a new event-driven dependency resolver and a cross-process/thread worker pool; integrates resolver into BaseEngine flows; updates module execution to optionally use the shared pool; and initializes/shuts down the shared pool in app startup/shutdown.
Changes
| Cohort / File(s) | Summary of changes |
|---|---|
Thread pool lifecycle in appnettacker/core/app.py |
Imports initialize_thread_pool and shutdown_thread_pool; initializes the shared pool before multiprocessing using derived process and worker counts; captures result from wait_for_threads_to_finish, calls shutdown_thread_pool(), and returns the captured result. |
Base engine dependency resolver integrationnettacker/core/lib/base.py |
Imports dependency_resolver from nettacker/core/queue_manager; replaces direct DB polling with dependency_resolver.get_dependency_results_efficiently (with fallback polling/backoff and retry limits); notifies resolver when saving temp events; returns/skips steps when dependencies are not yet available. |
Module execution with shared thread poolnettacker/core/module.py |
Imports queue_manager/thread_pool; refactors sort_loops to categorize steps into independent_steps, dependency_generators, and dependency_consumers; start submits engine.run to shared pool if available (falls back to local Thread), and preserves per-host concurrency via existing wait logic. |
New queue and dependency systemnettacker/core/queue_manager.py |
New module adding DependentTask dataclass, DependencyResolver (pending-task queue, cache, notify/execute semantics, expiry), and CrossProcessThreadPool (multi-process workers with per-process threads); exposes globals dependency_resolver, thread_pool, and helpers initialize_thread_pool / shutdown_thread_pool. |
Estimated code review effort
🎯 4 (Complex) | ⏱️ ~45–90 minutes
Pre-merge checks and finishing touches
✅ Passed checks (3 passed)
| Check name | Status | Explanation |
|---|---|---|
| Title Check | ✅ Passed | The title “fix multi-threading issue” clearly reflects the primary change of resolving threading and dependency resolution problems in the codebase and is concise and directly related to the main focus of the pull request. |
| Description Check | ✅ Passed | The description outlines the replacement of busy-wait loops with event-driven dependency resolution, the addition of cross-process thread sharing, references the relevant files, and ties to the issue being closed, making it clearly related to the changeset. |
| Docstring Coverage | ✅ Passed | Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%. |
✨ Finishing touches
- [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
Please also look into CodeRabbit's suggestions and also, the CI/CD tests aren't passing
[2025-10-06 10:32:12][+] Worker process 0 started with 1 threads
[2025-10-06 10:37:19][!] Module dir_scan tasks did not complete within timeout
[2025-10-06 10:42:19][!] Module ssl_certificate_weak_signature_vuln tasks did not complete within timeout
[2025-10-06 10:47:19][!] Module galera_webtemp_cve_2021_40960_vuln tasks did not complete within timeout
[2025-10-06 10:52:19][!] Module gurock_testrail_cve_2021_40875_vuln tasks did not complete within timeout
[2025-10-06 10:57:19][!] Module wp_plugin_cve_2021_38314_vuln tasks did not complete within timeout
[2025-10-06 11:08:36][!] Module log4j_cve_2021_44228_vuln tasks did not complete within timeout
[2025-10-06 11:13:36][!] Module joomla_version_scan tasks did not complete within timeout
[2025-10-06 11:18:36][!] Module forgerock_am_cve_2021_35464_vuln tasks did not complete within timeout
[2025-10-06 11:23:37][!] Module teamcity_cve_2024_27198_vuln tasks did not complete within timeout
I think there is some issue in the way you have defined your worker subprocess. Do look into that.
@pUrGe12 I raised a PR with change you told of sort_loops, can you please review it and tell if any further modifications or changes needed ? PR