cve-bin-tool icon indicating copy to clipboard operation
cve-bin-tool copied to clipboard

test: improve performance on our slowest tests

Open terriko opened this issue 1 year ago • 5 comments

in #4319 I'm switching pytest to print our longest duration tests so we can see about improving the performance of our test suite. On a random local run, here's what I saw

======================================== slowest 50 durations ========================================
291.38s call     test/test_language_scanner.py::TestLanguageScanner::test_language_package[/home/terri/Code/cve-bin-tool/test/language_data/Cargo.lock-products1]
203.69s call     test/test_language_scanner.py::TestLanguageScanner::test_language_package[/home/terri/Code/cve-bin-tool/test/language_data/Gemfile.lock-products2]
119.99s call     test/test_language_scanner.py::TestLanguageScanner::test_language_package[/home/terri/Code/cve-bin-tool/test/language_data/requirements.txt-products3]
99.68s call     test/test_language_scanner.py::TestLanguageScanner::test_language_package[/home/terri/Code/cve-bin-tool/test/language_data/renv.lock-products0]
49.71s call     test/test_language_scanner.py::TestLanguageScanner::test_language_package[/home/terri/Code/cve-bin-tool/test/language_data/go.mod-products6]
38.39s call     test/test_language_scanner.py::TestLanguageScanner::test_language_package[/home/terri/Code/cve-bin-tool/test/language_data/package-lock.json-products4]
23.17s call     test/test_language_scanner.py::TestLanguageScanner::test_language_package[/home/terri/Code/cve-bin-tool/test/language_data/cpanfile-products9]
11.63s call     test/test_requirements.py::test_requirements
11.18s call     test/test_cli.py::TestCLI::test_EPSS_percentile
11.14s call     test/test_cli.py::TestCLI::test_EPSS_probability
10.74s call     test/test_language_scanner.py::TestLanguageScanner::test_java_package[/home/terri/Code/cve-bin-tool/test/language_data/pom.xml-product_list0]
9.26s setup    test/test_available_fix.py::TestAvailableFixReport::test_debian_backport_fix_output
7.81s call     test/test_language_scanner.py::TestLanguageScanner::test_language_package[/home/terri/Code/cve-bin-tool/test/language_data/Package.resolved-products7]
7.31s call     test/test_language_scanner.py::TestLanguageScanner::test_language_package_none_found[/home/terri/Code/cve-bin-tool/test/language_data/fail_pom.xml]
5.22s call     test/test_language_scanner.py::TestLanguageScanner::test_language_package[/home/terri/Code/cve-bin-tool/test/language_data/.package-lock.json-products5]
4.83s call     test/test_cli.py::TestCLI::test_sbom_detection
4.83s call     test/test_cli.py::TestCLI::test_CVSS_score
4.82s call     test/test_language_scanner.py::TestLanguageScanner::test_language_package[/home/terri/Code/cve-bin-tool/test/language_data/composer.lock-products8]
4.53s call     test/test_cli.py::TestCLI::test_SBOM
3.82s call     test/test_source_purl2cpe.py::TestSourceOSV::test_db_contents[1-False]
2.20s call     test/test_language_scanner.py::TestLanguageScanner::test_language_package[/home/terri/Code/cve-bin-tool/test/language_data/pubspec.lock-products10]
2.18s call     test/test_cli.py::TestCLI::test_disabled_sources
2.12s setup    test/test_cli.py::TestCLI::test_extract_bad_zip_messages
2.01s setup    test/test_cli.py::TestCLI::test_sbom_detection
1.65s setup    test/test_cli.py::TestCLI::test_EPSS_probability
1.59s setup    test/test_cli.py::TestCLI::test_SBOM
1.49s call     test/test_output_engine.py::TestOutputEngine::test_output_file_wrapper
1.44s call     test/test_cli.py::TestCLI::test_severity
1.39s setup    test/test_cli.py::TestCLI::test_EPSS_percentile
1.31s call     test/test_cli.py::TestCLI::test_quiet_mode
1.13s call     test/test_merge.py::TestMergeReports::test_valid_merge[filepaths0-merged_data0]
1.00s setup    test/test_cli.py::TestCLI::test_extract_encrypted_zip_messages
0.99s setup    test/test_html.py::TestOutputHTML::test_interactive_mode_print_mode_switching[chromium]
0.95s call     test/test_language_scanner.py::TestLanguageScanner::test_javascript_package_none_found[/home/terri/Code/cve-bin-tool/test/language_data/fail-package-lock.json]
0.89s setup    test/test_cli.py::TestCLI::test_disabled_sources
0.83s setup    test/test_cli.py::TestCLI::test_config_generator[args0-config.yaml-expected_contents0]
0.80s call     test/test_sbom.py::TestSBOM::test_invalid_xml[/home/terri/Code/cve-bin-tool/test/sbom/spdx_test.spdx.xml-cyclonedx-True]
0.73s setup    test/test_cli.py::TestCLI::test_config_generator[args1-config.toml-expected_contents1]
0.70s call     test/test_cli.py::TestCLI::test_extract_bad_zip_messages
0.68s call     test/test_cli.py::TestCLI::test_runs
0.66s call     test/test_sbom.py::TestSBOM::test_invalid_xml[/home/terri/Code/cve-bin-tool/test/sbom/swid_test.xml-cyclondedx-True]
0.65s call     test/test_cli.py::TestCLI::test_skips
0.64s call     test/test_merge.py::TestMergeReports::test_valid_cve_scanner_instance[filepaths0]
0.60s call     test/test_sbom.py::TestSBOM::test_sbom_detection[/home/terri/Code/cve-bin-tool/test/sbom/cyclonedx_test.xml-cyclonedx]
0.55s setup    test/test_cli.py::TestCLI::test_invalid_parameter
0.55s setup    test/test_cli.py::TestCLI::test_version
0.55s setup    test/test_cli.py::TestCLI::test_severity
0.55s setup    test/test_cli.py::TestCLI::test_quiet_mode
0.54s setup    test/test_cli.py::TestCLI::test_skips
0.54s setup    test/test_cli.py::TestCLI::test_usage
====================================== short test summary info =======================================

It looks like our language scanner tests are noticeably slower on my machine. If I had to guess, the primary problem is likely due to the sheer number of products and vulnerabilities those tests look up, so I would start by reducing the test files to look up a minimal number of products and make sure that the products that they look up have a minimal number of vulnerabilities. Exactly how many products you should keep will depend on what's needed to test different parsing and to conform to however a full lock file with dependencies should look for the language, but if you can get enough test coverage with 1 product that has 1 vulnerability, go for it!

It's entirely possible that there's also performance gains to be had in the language scanner code if you want to do a deeper dive there too!

terriko avatar Aug 08 '24 00:08 terriko

Hi @terriko , this looks like an interesting issue! I’d love to help speed up the tests. I’ll start by checking the longest-running ones and see if we can reduce the number of product lookups while keeping the coverage solid. Also, I’ll take a look at the language scanner to see if there are any performance tweaks we can make. Let me know if there are any specific things I should keep in mind. Excited to contribute!

Gyan-max avatar Feb 08 '25 18:02 Gyan-max

@Gyan-max thanks! I think reducing the product lookups is going to make the biggest difference even if we make other performance tweaks, so probably start there.

terriko avatar Feb 10 '25 21:02 terriko

@terriko To improve the performance of our test suite, I propose the following solutions:

  • Reduce Test File Sizes
  1. Limit the number of products in each test file to the minimum required for effective parsing validation.
  2. Ensure each product has only one vulnerability where possible.
  • Optimize Vulnerability Lookups
  1. Investigate if unnecessary lookups are being performed. 2.Consider caching or mocking responses to reduce execution time.
  • Enable Parallel Execution 1.Utilize pytest-xdist (pytest -n auto) to run tests in parallel.

  • Profile and Optimize Code

1.Use pytest --durations=10 --profile to identify performance bottlenecks. 2.Optimize the parsing logic and data structures in language_scanner.py for efficiency.

These steps should help reduce execution time while maintaining test coverage.

Shrishti1701 avatar Mar 05 '25 09:03 Shrishti1701

Hi @terriko, I'd like to work on this issue as part of my GSoC preparation.
Could you assign it to me? I will analyze the test performance and suggest improvements.
Thanks!

SachinMugade8797 avatar Apr 02 '25 17:04 SachinMugade8797

Hi, I’ve opened a PR that addresses this by adding lazy CVE DB initialization and language short-circuiting in scan_file, along with minimizing the fail_pom.xml fixture. The language POM test went from 0.18 s → 0.13 s on my machine.

PR: #5390

gheyderov avatar Oct 10 '25 08:10 gheyderov