openfold icon indicating copy to clipboard operation
openfold copied to clipboard

inconsistency in tests/compare_utils.py, AssertionError: Scalars are not close!

Open RJ3 opened this issue 1 year ago • 2 comments

One of the tests seems to fail inconsistently. Changing nothing, you can re-run the test and it might pass the next time. Possibly something needs to be changed with precision or increase threshold of difference?

FAIL: test_compare_model (tests.test_deepspeed_evo_attention.TestDeepSpeedKernel)
Run full model with and without using DeepSpeed Evoformer attention kernel
----------------------------------------------------------------------
Traceback (most recent call last):
  File "openfold/tests/test_deepspeed_evo_attention.py", line 334, in test_compare_model
    compare_utils.assert_mean_abs_diff_small(out_repro, out_repro_ds, eps)
  File "openfold/tests/compare_utils.py", line 139, in assert_mean_abs_diff_small
    _assert_abs_diff_small_base(torch.mean, expected, actual, eps)
  File "openfold/tests/compare_utils.py", line 131, in _assert_abs_diff_small_base
    torch.testing.assert_close(err, zero_tensor, atol=eps, rtol=rtol)
  File "micromamba/envs/openfold-pl/lib/python3.10/site-packages/torch/testing/_comparison.py", line 1520, in assert_close
    raise error_metas[0].to_error(msg)
AssertionError: Scalars are not close!

Expected 0.0 but got 0.4344009459018707.
Absolute difference: 0.4344009459018707 (up to 0.2 allowed)
Relative difference: inf (up to 1.3e-06 allowed)

----------------------------------------------------------------------
Ran 117 tests in 30.009s

FAILED (failures=1, skipped=41)

Test(s) failed. Make sure you've installed all Python dependencies.

branch: pl_upgrades

RJ3 avatar Aug 25 '24 11:08 RJ3