warp icon indicating copy to clipboard operation
warp copied to clipboard

[BUG] Investigate `test_sim_grad.py` failures

Open shi-eric opened this issue 11 months ago • 3 comments

Bug Description

The test_sphere_pushing_on_rails_* tests have been occasionally failing on Linux and Windows. I've disabled the tests for now by renaming warp/tests/{test_sim_grad.py => flaky_test_sim_grad.py} so as not to block merge requests.

Windows example

======================================================================
test_sphere_pushing_on_rails_d6_cuda_0 (warp.tests.test_sim_grad.TestSimGradients)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\g\136337560\warp\tests\unittest_utils.py", line 248, in test_func
    func(self, device, **kwargs)
  File "C:\g\136337560\warp\tests\test_sim_grad.py", line 274, in test_fn
    return test_sphere_pushing_on_rails(
  File "C:\g\136337560\warp\tests\test_sim_grad.py", line 260, in test_sphere_pushing_on_rails
    gradcheck(rollout, [action_too_close], device=device, eps=0.2, tol=tol, print_grad=print_grad)
  File "C:\g\136337560\warp\tests\test_sim_grad.py", line 82, in gradcheck
    assert_np_equal(ad_grad, fd_grad, tol=tol)
  File "C:\g\136337560\warp\tests\unittest_utils.py", line 236, in assert_np_equal
    np.testing.assert_allclose(result.flatten(), expect.flatten(), atol=tol, equal_nan=True)
  File "C:\g\136337560\_venv\lib\site-packages\numpy\testing\_private\utils.py", line 1592, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "C:\g\136337560\_build\target-deps\python\lib\contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "C:\g\136337560\_venv\lib\site-packages\numpy\testing\_private\utils.py", line 862, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=1e-07, atol=0.2
Mismatched elements: 2 / 2 (100%)
Max absolute difference: 0.30897596
Max relative difference: 28.690039
 x: array([-0.319745, -0.319745], dtype=float32)
 y: array([-0.010769, -0.010769], dtype=float32)

Linux example

======================================================================
test_sphere_pushing_on_rails_d6_cuda_0 (warp.tests.test_sim_grad.TestSimGradients)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builds/omniverse/warp/warp/tests/unittest_utils.py", line 248, in test_func
    func(self, device, **kwargs)
  File "/builds/omniverse/warp/warp/tests/test_sim_grad.py", line 274, in test_fn
    return test_sphere_pushing_on_rails(
  File "/builds/omniverse/warp/warp/tests/test_sim_grad.py", line 260, in test_sphere_pushing_on_rails
    gradcheck(rollout, [action_too_close], device=device, eps=0.2, tol=tol, print_grad=print_grad)
  File "/builds/omniverse/warp/warp/tests/test_sim_grad.py", line 84, in gradcheck
    assert np.allclose(ad_grad * fd_grad > 0, True)
AssertionError

Linux example

======================================================================
test_sphere_pushing_on_rails_d6_cuda_0 (warp.tests.test_sim_grad.TestSimGradients)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builds/omniverse/warp/warp/tests/unittest_utils.py", line 248, in test_func
    func(self, device, **kwargs)
  File "/builds/omniverse/warp/warp/tests/test_sim_grad.py", line 274, in test_fn
    return test_sphere_pushing_on_rails(
  File "/builds/omniverse/warp/warp/tests/test_sim_grad.py", line 260, in test_sphere_pushing_on_rails
    gradcheck(rollout, [action_too_close], device=device, eps=0.2, tol=tol, print_grad=print_grad)
  File "/builds/omniverse/warp/warp/tests/test_sim_grad.py", line 82, in gradcheck
    assert_np_equal(ad_grad, fd_grad, tol=tol)
  File "/builds/omniverse/warp/warp/tests/unittest_utils.py", line 236, in assert_np_equal
    np.testing.assert_allclose(result.flatten(), expect.flatten(), atol=tol, equal_nan=True)
  File "/builds/omniverse/warp/_venv/lib/python3.8/site-packages/numpy/testing/_private/utils.py", line 1592, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "/builds/omniverse/warp/_build/target-deps/python/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/builds/omniverse/warp/_venv/lib/python3.8/site-packages/numpy/testing/_private/utils.py", line 862, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=1e-07, atol=0.2
Mismatched elements: 2 / 2 (100%)
Max absolute difference: 0.30897596
Max relative difference: 28.690039
 x: array([-0.319745, -0.319745], dtype=float32)
 y: array([-0.010769, -0.010769], dtype=float32)

Windows example:

======================================================================
test_sphere_pushing_on_rails_d6_cuda_0 (warp.tests.test_sim_grad.TestSimGradients)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\g\136260490\warp\tests\unittest_utils.py", line 248, in test_func
    func(self, device, **kwargs)
  File "C:\g\136260490\warp\tests\test_sim_grad.py", line 274, in test_fn
    return test_sphere_pushing_on_rails(
  File "C:\g\136260490\warp\tests\test_sim_grad.py", line 260, in test_sphere_pushing_on_rails
    gradcheck(rollout, [action_too_close], device=device, eps=0.2, tol=tol, print_grad=print_grad)
  File "C:\g\136260490\warp\tests\test_sim_grad.py", line 82, in gradcheck
    assert_np_equal(ad_grad, fd_grad, tol=tol)
  File "C:\g\136260490\warp\tests\unittest_utils.py", line 236, in assert_np_equal
    np.testing.assert_allclose(result.flatten(), expect.flatten(), atol=tol, equal_nan=True)
  File "C:\g\136260490\_venv\lib\site-packages\numpy\testing\_private\utils.py", line 1684, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "C:\g\136260490\_build\target-deps\python\lib\contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "C:\g\136260490\_venv\lib\site-packages\numpy\testing\_private\utils.py", line 885, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=1e-07, atol=0.2
Mismatched elements: 2 / 2 (100%)
Max absolute difference among violations: 0.30897596
Max relative difference among violations: 28.690039
 ACTUAL: array([-0.319745, -0.319745], dtype=float32)
 DESIRED: array([-0.010769, -0.010769], dtype=float32)

Linux example:

======================================================================
test_sphere_pushing_on_rails_d6_cuda_0 (warp.tests.test_sim_grad.TestSimGradients)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builds/omniverse/warp/warp/tests/unittest_utils.py", line 248, in test_func
    func(self, device, **kwargs)
  File "/builds/omniverse/warp/warp/tests/test_sim_grad.py", line 274, in test_fn
    return test_sphere_pushing_on_rails(
  File "/builds/omniverse/warp/warp/tests/test_sim_grad.py", line 260, in test_sphere_pushing_on_rails
    gradcheck(rollout, [action_too_close], device=device, eps=0.2, tol=tol, print_grad=print_grad)
  File "/builds/omniverse/warp/warp/tests/test_sim_grad.py", line 84, in gradcheck
    assert np.allclose(ad_grad * fd_grad > 0, True)
AssertionError

System Information

No response

shi-eric avatar Jan 25 '25 07:01 shi-eric

BTW @eric-heiden it seems that this issue still isn't quite fixed:

======================================================================
test_sphere_pushing_on_rails_d6_cpu (warp.tests.sim.test_sim_grad.TestSimGradients.test_sphere_pushing_on_rails_d6_cpu)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/warp/warp/warp/tests/unittest_utils.py", line 256, in test_func
    func(self, device, **kwargs)
  File "/home/runner/work/warp/warp/warp/tests/sim/test_sim_grad.py", line 286, in test_fn
    return test_sphere_pushing_on_rails(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/work/warp/warp/warp/tests/sim/test_sim_grad.py", line 264, in test_sphere_pushing_on_rails
    gradcheck(rollout, [action_too_far], device=device, eps=0.2, tol=tol, print_grad=print_grad)
  File "/home/runner/work/warp/warp/warp/tests/sim/test_sim_grad.py", line 91, in gradcheck
    assert_np_equal(ad_grad, fd_grad, tol=tol)
  File "/home/runner/work/warp/warp/warp/tests/unittest_utils.py", line 244, in assert_np_equal
    np.testing.assert_allclose(result.flatten(), expect.flatten(), atol=tol, equal_nan=True)
  File "/opt/hostedtoolcache/Python/3.12.10/arm64/lib/python3.12/site-packages/numpy/testing/_private/utils.py", line 1715, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "/opt/hostedtoolcache/Python/3.12.10/arm64/lib/python3.12/site-packages/numpy/testing/_private/utils.py", line 921, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=1e-07, atol=0.2

Mismatched elements: 1 / 2 (50%)
Max absolute difference among violations: 4.56534
Max relative difference among violations: 33.349747
 ACTUAL: array([0.166123, 4.702233], dtype=float32)
 DESIRED: array([0.136893, 0.136893], dtype=float32)

From https://github.com/NVIDIA/warp/actions/runs/14466352296/job/40569876293

shi-eric avatar Apr 15 '25 12:04 shi-eric

Does this merged MR fix this bug? https://gitlab-master.nvidia.com/omniverse/warp/-/merge_requests/1243

If yes, can we close it?

momo-van avatar Apr 16 '25 19:04 momo-van

No, see my last comment. There's still bugs around.

shi-eric avatar Apr 17 '25 00:04 shi-eric