MCDC icon indicating copy to clipboard operation
MCDC copied to clipboard

GPU regression testing and reproducibility

Open clemekay opened this issue 1 year ago • 3 comments

Currently, GPU results do not exactly reproduce CPU results for regression tests. This seems to be because of an issue with numba's memory re-allocation (#224, also here).

For now, we plan to create GPU-specific answer files to:

  • [ ] ensure GPU reproducibility within a single architecture when code changes are made,
  • [ ] ensure GPU reproducibility across architectures.

If numba were to change how it handles the memory issue, that could break our reproducibility; that would also be the case for CPU, and that hasn't happened yet. For future numba version releases, we should check back with this issue before supporting the new versions.

clemekay avatar Aug 13 '24 20:08 clemekay

I want to make sure: (1) the issue (#224) also applies to the current CPU mode and (2) that GPU mode does not reproduce the CPU mode results may be due to other issues, right @braxtoncuneo?

ilhamv avatar Aug 14 '24 08:08 ilhamv

GPU is found to reproduce SOME of the CPU results. The following may be a useful reference to help point out what breaks the CPU-GPU reproducibility:

Screenshot 2024-08-15 at 12 00 45 PM image image

cc: @braxtoncuneo, @jpmorgan98

ilhamv avatar Aug 15 '24 05:08 ilhamv

I attempted to create the GPU regression test keys here: https://github.com/ilhamv/MCDC/tree/gpu_regression_test

However, I found that the current GPU implementation is not reproducing its own solution for two problems:

  • inf_shem361_k_eigenvalue and
  • smrg7

cc: @braxtoncuneo, @jpmorgan98

ilhamv avatar Aug 15 '24 05:08 ilhamv

Has this been addressed, @jpmorgan98 @braxtoncuneo ?

ilhamv avatar Jan 26 '25 23:01 ilhamv

Addressed in #316

clemekay avatar Apr 16 '25 19:04 clemekay