fix: improve experience for users using multiple copies of cve-bin-tool in parallel
We have a standard set of instructions for folk using cve-bin-tool in parallel jobs that recommends that people separate the database updates from the scans as follows:
https://cve-bin-tool.readthedocs.io/en/latest/how_to_guides/multiple_scans_at_once.html
But it's easy for people to miss that, and we're not perfectly handling the database so that it just magically works, as we've seen in some recent issues including #4773
I think we can do better and have a few ideas we could implement:
- Make it so that you can run cve-bin-tool to do an update without requiring a filename to scan. This should maybe be an option (e.g. something like
cve-bin-tool --update-only) because in a lot of cases, we want people to know immediately that they need to specify a directory to scan, otherwise they'll see it start running and come back 20 minutes later and the cache will be updated but they'll have no results to look at. - Handle some sort of basic auto-detection of parallel instances and direct people to the doc link above OR automatically switch any jobs to use
-u neverif there's already a job running and spit out a giant warning message explaining that it was done and why with a link to the docs. Could use a lock file or something for this? This will require some finesse so it won't break people's existing setups and accidentally cause things to never update.
I'm open to better ideas if anyone has any, though.
Hi! @terriko I’ve reviewed the issue (#4777) regarding improving the experience for users running multiple copies of cve-bin-tool in parallel. I would like to work on this issue and help implement a solution that introduces a --update-only flag to allow users to update the database without specifying a directory to scan. Additionally, I plan to implement a lock mechanism to prevent parallel database updates and handle the -u never option to ensure smoother execution.
Could you please assign this issue to me? I’d be happy to contribute and ensure it’s resolved effectively.
Thank you!
@Gyan-max let us know if you get stuck anywhere! Definitely put the --update-only flag in a separate PR since that should be simple and easy to merge, and don't forget to update the docs and provide some tests with it when you do. The other part is probably going to be a lot harder.
Hi @terriko,
Thanks for assigning me this issue! Here’s my proposed approach to resolving it:
1️⃣ Implement --update-only Flag (First PR)
Add a new CLI option --update-only that updates the database without requiring a file or directory. Ensure that it exits immediately after updating. Update the documentation to include this new flag. Write tests to verify that it works correctly.
2️⃣ Implement Lock Mechanism (Second PR) Introduce a lock file system to prevent multiple instances from updating the database simultaneously. If a parallel update is detected, either warn the user or automatically set -u never. Ensure the lock file is safely removed after execution. Write test cases to validate proper handling.
I’ll begin with the --update-only flag and submit a PR soon. Let me know if you have any additional suggestions!
Hi @terriko,
Apologies for the delay in working on this issue actually, my exams were on. I’m still interested in contributing and will resume work on the --update-only flag soon. I plan to implement and submit the PR within the next few days.
Please let me know if there have been any updates or changes regarding the approach. Thanks for your patience!