Nettacker icon indicating copy to clipboard operation
Nettacker copied to clipboard

fix multi-threading issue

Open Davda-James opened this issue 2 months ago • 3 comments

Proposed change

Replaces busy-wait loops with event-driven dependency resolution and adds cross-process thread sharing for better resource utilization.

Changes and approach I took

  • Eliminated busy-wait loops that polled database every 100ms
  • Fixed subprocesses dying instead of helping other processes
  • Tasks queue when dependencies unavailable, execute automatically when ready
  • Idle processes help busy ones through shared task queue

Closes issue : #595

Files changed

nettacker/core/queue_manager.py (new) 
nettacker/core/lib/base.py
nettacker/core/app.py
nettacker/core/module.py

Type of change

  • [x] Bugfix (non-breaking change which fixes an issue)

Checklist

  • [x] I've followed the [contributing guidelines][contributing-guidelines]
  • [x] I've run make pre-commit, it didn't generate any changes
  • [x] I've run make test, all tests passed locally

Davda-James avatar Oct 05 '25 23:10 Davda-James

Summary by CodeRabbit

  • New Features
    • Smarter handling of step dependencies to auto-queue and resume tasks when prerequisites are ready.
  • Performance
    • Introduces a shared, multi-process thread pool for more efficient parallel scans.
    • Event-driven queue reduces busy-waiting and CPU usage during scans.
  • Reliability
    • Improved retry and backoff logic to reduce timeouts and flaky dependency waits.
    • More consistent execution order for independent, generator, and consumer steps.
  • Logging
    • Clearer verbose logs on dependency availability, step ordering, and task execution status.

Walkthrough

Adds a new event-driven dependency resolver and a cross-process/thread worker pool; integrates resolver into BaseEngine flows; updates module execution to optionally use the shared pool; and initializes/shuts down the shared pool in app startup/shutdown.

Changes

Cohort / File(s) Summary of changes
Thread pool lifecycle in app
nettacker/core/app.py
Imports initialize_thread_pool and shutdown_thread_pool; initializes the shared pool before multiprocessing using derived process and worker counts; captures result from wait_for_threads_to_finish, calls shutdown_thread_pool(), and returns the captured result.
Base engine dependency resolver integration
nettacker/core/lib/base.py
Imports dependency_resolver from nettacker/core/queue_manager; replaces direct DB polling with dependency_resolver.get_dependency_results_efficiently (with fallback polling/backoff and retry limits); notifies resolver when saving temp events; returns/skips steps when dependencies are not yet available.
Module execution with shared thread pool
nettacker/core/module.py
Imports queue_manager/thread_pool; refactors sort_loops to categorize steps into independent_steps, dependency_generators, and dependency_consumers; start submits engine.run to shared pool if available (falls back to local Thread), and preserves per-host concurrency via existing wait logic.
New queue and dependency system
nettacker/core/queue_manager.py
New module adding DependentTask dataclass, DependencyResolver (pending-task queue, cache, notify/execute semantics, expiry), and CrossProcessThreadPool (multi-process workers with per-process threads); exposes globals dependency_resolver, thread_pool, and helpers initialize_thread_pool / shutdown_thread_pool.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45–90 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title “fix multi-threading issue” clearly reflects the primary change of resolving threading and dependency resolution problems in the codebase and is concise and directly related to the main focus of the pull request.
Description Check ✅ Passed The description outlines the replacement of busy-wait loops with event-driven dependency resolution, the addition of cross-process thread sharing, references the relevant files, and ties to the issue being closed, making it clearly related to the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Oct 05 '25 23:10 coderabbitai[bot]

Please also look into CodeRabbit's suggestions and also, the CI/CD tests aren't passing

[2025-10-06 10:32:12][+] Worker process 0 started with 1 threads
[2025-10-06 10:37:19][!] Module dir_scan tasks did not complete within timeout
[2025-10-06 10:42:19][!] Module ssl_certificate_weak_signature_vuln tasks did not complete within timeout
[2025-10-06 10:47:19][!] Module galera_webtemp_cve_2021_40960_vuln tasks did not complete within timeout
[2025-10-06 10:52:19][!] Module gurock_testrail_cve_2021_40875_vuln tasks did not complete within timeout
[2025-10-06 10:57:19][!] Module wp_plugin_cve_2021_38314_vuln tasks did not complete within timeout
[2025-10-06 11:08:36][!] Module log4j_cve_2021_44228_vuln tasks did not complete within timeout
[2025-10-06 11:13:36][!] Module joomla_version_scan tasks did not complete within timeout
[2025-10-06 11:18:36][!] Module forgerock_am_cve_2021_35464_vuln tasks did not complete within timeout
[2025-10-06 11:23:37][!] Module teamcity_cve_2024_27198_vuln tasks did not complete within timeout

I think there is some issue in the way you have defined your worker subprocess. Do look into that.

pUrGe12 avatar Oct 06 '25 11:10 pUrGe12

@pUrGe12 I raised a PR with change you told of sort_loops, can you please review it and tell if any further modifications or changes needed ? PR

Davda-James avatar Oct 07 '25 14:10 Davda-James