Description

This PR introduces synthesis tests into the pytest framework for hls4ml.

The goal of this change is to automate the validation of HLS synthesis reports against predefined baselines to ensure stability and correctness over time, using pytest.

Main Changes

Baselines are shipped with this repository under test/pytest/baselines/<backend>/<version>/<report>.json. Each test case specifies its baseline name, matching the artifact names produced by CI so comparisons stay aligned. Baselines must be refreshed within the PR that changes the synthesis expectations; otherwise the tests will fail.
test/pytest/conftest.py fixture: introduces the shared synthesis_config fixture that drives every synthesis test. It gathers:
- whether synthesis should run (RUN_SYNTHESIS, default false if env var is unset);
- tool versions (VIVADO_VERSION, VITIS_VERSION, QUARTUS_VERSION, ONEAPI_VERSION) with defaults if the environment variables are unset;
- backend-specific build arguments.
Synthesis helper (test/pytest/synthesis_helpers.py):
- Main entry point: run_synthesis_test.
- When run_synthesis is disabled the helper returns early; otherwise it builds, saves the report, and compares it to the selected baseline.
- Quartus remains skipped, while Vivado, Vitis, and oneAPI execute normally.
- Missing tools, missing baselines, or mismatched reports trigger pytest.fail(...), so synthesis regressions surface immediately instead of being skipped.
- The helper persists the generated reports as artifacts, which also helps bootstrap new baselines when they do not exist yet.
Tolerances: backend-specific tolerances are still provisional and will need further tuning as we collect more data.
CI integration (test/pytest/ci-template.yml):
- Vivado and Vitis run inside Apptainer containers, sourcing the toolchains from CVMFS. This keeps the CI image smaller while letting us pick tool versions job-by-job.
- oneAPI remains installed in the image for now and will move to the apptainer container flow in a follow-up.
- CI stores the synthesis reports as artifacts so we can compare performance against the tracked baselines.
generate_ci_yaml.py now has a SPLIT_BY_TEST_CASE map. For any test file listed there, the script inspects its test functions, batches them according to the configured chunk size, and emits separate CI jobs (each passing PYTESTFILE="path::test_fn"). This keeps heavy synthesis suites like test_keras_api sharded without touching the test code.
oneAPI report parsing has been hardened to tolerate values such as n/a, avoiding the previous invalid literal for int() with base 10: 'n/a' error.
Current test coverage:
- test_keras_api.py is the first pytest using the synthesis flow. Each parametrized test now takes the synthesis_config fixture, formats a baseline filename (e.g. hls4mlprj_keras_api_dense_{backend}_{io_type}.json), and calls run_synthesis_test(...) after the assertions. Some cases (such as test_conv2d) guard the call with backend/strategy filters to avoid unsupported configurations or to avoid syntehsis that take too long for the jobs.
- To enable synthesis elsewhere, reproduce the same pattern.
- Vivado, Vitis, and oneAPI synthesis paths are active; Quartus is currently not supported.

Type of change

[x] Other: improvment of testing infrastrucutre using pytest

Tests

Synthesis tests are run conditionally and compared against versioned baselines if RUN_SYNTHESIS=true is set in the environment.

Checklist

[x] I have read the guidelines for contributing.
[x] I have commented my code, particularly in hard-to-understand areas.
[ ] I have made corresponding changes to the documentation.
[x] My changes generate no new warnings.
[x] I have installed and run pre-commit on the files I edited or added.
[ ] I have added tests that prove my feature works.

Apr 02 '25 21:04 marco66colombo

Can we test this to actually run SYNTHESIS tests?

Apr 07 '25 19:04 jmitrevs

yes, but if the env var 'RUN_SYNTHESIS' is not set, the default is False. I can add a line in the ci-template.yml and export RUN_SYNTHESIS=True

Apr 07 '25 20:04 marco66colombo

I added the support for comparing the new oneAPI report and modified the generate_ci_yaml.py to just run the test_keras_api.py. This last change is only temporary (this is a draft PR), aimed at focusing only on the tests where the synthesis is supported, to speed up the debugging and avoid overloading the runners.

Apr 09 '25 22:04 marco66colombo

Do you all agree with the approach of storing all the baselines in a dedicated GitHub repo, and add it as a module under test/pytest/baselines in hls4ml? If yes, we need to create the repository within Fastmachinelearning and then I can upload as baselines the artifacts generated by the synthesis tests.

Apr 15 '25 16:04 marco66colombo

I generally agree about storing the baselines in a dedicated repo. Is CERN gitlab or regular github preferred? We create the testing containers in gitlab. I don't really have a preference.

Apr 15 '25 20:04 jmitrevs

Thanks, this looks very nice to me! From the test logs, it seems to have triggered the synthesis tests successfully and the code looks good to me, so I think this is basically ready for merge. I just have a few questions for the way forward:

What are the plans to extend this to more tests? How much can we realistically do with the resources we have available?
As you said, the thresholds are preliminary, how are these going to be tuned?
What's preventing us from testing Quartus right now?

Oct 31 '25 12:10 JanFSchulte

My feeling is that Quartus is basically obsolete now, with the code having migrated to oneAPI. I don't know if we need to spend effort towards setting up synthesis tests for it.

Oct 31 '25 15:10 jmitrevs

Fair point. The other thing brought up at the meeting was if we can rebalance the batching of tests somehow? Right now some of them take a few minutes while another takes 1:50h.

Oct 31 '25 16:10 JanFSchulte

@JanFSchulte thanks for the review and the follow-up questions.

Extending coverage: I’d like to bring synthesis to additional pytest files, but the pace depends on the CERN GitLab runner capacity. I’ll check the available concurrency/quotas and follow up with a better answer for this question. If possible, it would be great to hear which tests the community considers highest priority for synthesis so we can focus on those first.
Tuning thresholds: the plan would be to sync with the contributors who regularly use the Vivado/Vitis/oneAPI flows so we can calibrate them against real reports. I’m happy to incorporate suggestions as soon as we have them, since I’m not confident about the best values for every metric, so any feedback is very appreciated.
Balancing batches: The longest jobs come both from parametrized tests that explode into many slow combinations and from individual cases that are expensive to synthesize. To tackle that, I’m trimming the heaviest branches directly in the tests with small if guards around run_synthesis_test, and on the CI side this PR introduces SPLIT_BY_TEST_CASE so we can shard the slowest files. I had been experimenting with more advanced batching policies, but I intentionally shipped this simpler version first so we can gather experience and iterate based on the feedbacks. I would like to discuss more on this.

Nov 03 '25 18:11 marco66colombo

Looks like the new tests actually have one failure in the latest run:

FAILED test_keras_api.py::test_conv2d[io_parallel-Vitis-Resource-same-channels_last] - AssertionError: BestLatency: expected 53.0, got 52.0 (tolerance=0%)

As for the other feedback, I agree that we should discuss these questions with the team. Would you be available to give an update in the hls4ml meeting on Friday next week @marco66colombo ?

Nov 06 '25 20:11 JanFSchulte

The failure was caused by the default value for the tolerances that was set to 0%. I set it as 1% now, and also updated the dictionary with tolerances in test/pytest/synthesis_helpers.py to reflect Vitis 2024.1 reports' keys.

Nov 06 '25 22:11 marco66colombo

@JanFSchulte I can definitely give the update in the hls4ml meeting. Is it on Friday 14th? Or tomorrow?

Nov 06 '25 22:11 marco66colombo

pytest based synthesis tests

Description

Main Changes

Type of change

Tests

Checklist