pytest based synthesis tests
Description
This PR introduces synthesis tests into the pytest framework for hls4ml.
The goal of this change is to automate the validation of HLS synthesis reports against predefined baselines to ensure stability and correctness over time, using pytest.
Main Changes
-
Baselines are shipped with this repository under
test/pytest/baselines/<backend>/<version>/<report>.json. Each test case specifies its baseline name, matching the artifact names produced by CI so comparisons stay aligned. Baselines must be refreshed within the PR that changes the synthesis expectations; otherwise the tests will fail. -
test/pytest/conftest.pyfixture: introduces the sharedsynthesis_configfixture that drives every synthesis test. It gathers:- whether synthesis should run (
RUN_SYNTHESIS, defaultfalseif env var is unset); - tool versions (
VIVADO_VERSION,VITIS_VERSION,QUARTUS_VERSION,ONEAPI_VERSION) with defaults if the environment variables are unset; - backend-specific build arguments.
- whether synthesis should run (
-
Synthesis helper (
test/pytest/synthesis_helpers.py):- Main entry point:
run_synthesis_test. - When
run_synthesisis disabled the helper returns early; otherwise it builds, saves the report, and compares it to the selected baseline. - Quartus remains skipped, while Vivado, Vitis, and oneAPI execute normally.
- Missing tools, missing baselines, or mismatched reports trigger
pytest.fail(...), so synthesis regressions surface immediately instead of being skipped. - The helper persists the generated reports as artifacts, which also helps bootstrap new baselines when they do not exist yet.
- Main entry point:
-
Tolerances: backend-specific tolerances are still provisional and will need further tuning as we collect more data.
-
CI integration (
test/pytest/ci-template.yml):- Vivado and Vitis run inside Apptainer containers, sourcing the toolchains from CVMFS. This keeps the CI image smaller while letting us pick tool versions job-by-job.
- oneAPI remains installed in the image for now and will move to the apptainer container flow in a follow-up.
- CI stores the synthesis reports as artifacts so we can compare performance against the tracked baselines.
-
generate_ci_yaml.pynow has aSPLIT_BY_TEST_CASEmap. For any test file listed there, the script inspects its test functions, batches them according to the configured chunk size, and emits separate CI jobs (each passingPYTESTFILE="path::test_fn"). This keeps heavy synthesis suites liketest_keras_apisharded without touching the test code. -
oneAPI report parsing has been hardened to tolerate values such as
n/a, avoiding the previousinvalid literal for int() with base 10: 'n/a'error. -
Current test coverage:
test_keras_api.pyis the first pytest using the synthesis flow. Each parametrized test now takes thesynthesis_configfixture, formats a baseline filename (e.g.hls4mlprj_keras_api_dense_{backend}_{io_type}.json), and callsrun_synthesis_test(...)after the assertions. Some cases (such astest_conv2d) guard the call with backend/strategy filters to avoid unsupported configurations or to avoid syntehsis that take too long for the jobs.- To enable synthesis elsewhere, reproduce the same pattern.
- Vivado, Vitis, and oneAPI synthesis paths are active; Quartus is currently not supported.
Type of change
- [x] Other: improvment of testing infrastrucutre using pytest
Tests
Synthesis tests are run conditionally and compared against versioned baselines if RUN_SYNTHESIS=true is set in the environment.
Checklist
- [x] I have read the guidelines for contributing.
- [x] I have commented my code, particularly in hard-to-understand areas.
- [ ] I have made corresponding changes to the documentation.
- [x] My changes generate no new warnings.
- [x] I have installed and run
pre-commiton the files I edited or added. - [ ] I have added tests that prove my feature works.
Can we test this to actually run SYNTHESIS tests?
yes, but if the env var 'RUN_SYNTHESIS' is not set, the default is False. I can add a line in the ci-template.yml and export RUN_SYNTHESIS=True
I added the support for comparing the new oneAPI report and modified the generate_ci_yaml.py to just run the test_keras_api.py. This last change is only temporary (this is a draft PR), aimed at focusing only on the tests where the synthesis is supported, to speed up the debugging and avoid overloading the runners.
Do you all agree with the approach of storing all the baselines in a dedicated GitHub repo, and add it as a module under test/pytest/baselines in hls4ml?
If yes, we need to create the repository within Fastmachinelearning and then I can upload as baselines the artifacts generated by the synthesis tests.
I generally agree about storing the baselines in a dedicated repo. Is CERN gitlab or regular github preferred? We create the testing containers in gitlab. I don't really have a preference.
Thanks, this looks very nice to me! From the test logs, it seems to have triggered the synthesis tests successfully and the code looks good to me, so I think this is basically ready for merge. I just have a few questions for the way forward:
- What are the plans to extend this to more tests? How much can we realistically do with the resources we have available?
- As you said, the thresholds are preliminary, how are these going to be tuned?
- What's preventing us from testing Quartus right now?
My feeling is that Quartus is basically obsolete now, with the code having migrated to oneAPI. I don't know if we need to spend effort towards setting up synthesis tests for it.
Fair point. The other thing brought up at the meeting was if we can rebalance the batching of tests somehow? Right now some of them take a few minutes while another takes 1:50h.
@JanFSchulte thanks for the review and the follow-up questions.
-
Extending coverage: I’d like to bring synthesis to additional pytest files, but the pace depends on the CERN GitLab runner capacity. I’ll check the available concurrency/quotas and follow up with a better answer for this question. If possible, it would be great to hear which tests the community considers highest priority for synthesis so we can focus on those first.
-
Tuning thresholds: the plan would be to sync with the contributors who regularly use the Vivado/Vitis/oneAPI flows so we can calibrate them against real reports. I’m happy to incorporate suggestions as soon as we have them, since I’m not confident about the best values for every metric, so any feedback is very appreciated.
-
Balancing batches: The longest jobs come both from parametrized tests that explode into many slow combinations and from individual cases that are expensive to synthesize. To tackle that, I’m trimming the heaviest branches directly in the tests with small if guards around run_synthesis_test, and on the CI side this PR introduces SPLIT_BY_TEST_CASE so we can shard the slowest files. I had been experimenting with more advanced batching policies, but I intentionally shipped this simpler version first so we can gather experience and iterate based on the feedbacks. I would like to discuss more on this.
Looks like the new tests actually have one failure in the latest run:
FAILED test_keras_api.py::test_conv2d[io_parallel-Vitis-Resource-same-channels_last] - AssertionError: BestLatency: expected 53.0, got 52.0 (tolerance=0%)
As for the other feedback, I agree that we should discuss these questions with the team. Would you be available to give an update in the hls4ml meeting on Friday next week @marco66colombo ?
The failure was caused by the default value for the tolerances that was set to 0%.
I set it as 1% now, and also updated the dictionary with tolerances in test/pytest/synthesis_helpers.py to reflect Vitis 2024.1 reports' keys.
@JanFSchulte I can definitely give the update in the hls4ml meeting. Is it on Friday 14th? Or tomorrow?