vision icon indicating copy to clipboard operation
vision copied to clipboard

[DONT MERGE] Test detection model with real weight and image

Open YosuaMichael opened this issue 2 years ago • 2 comments

I did experiment and try to test detection model with real weight and image. This PR is meant for experiment on the CI and not meant to be merged.

YosuaMichael avatar Sep 19 '22 15:09 YosuaMichael

@YosuaMichael from this, I see:

_________________ test_detection_model[cuda-fcos_resnet50_fpn] _________________
Traceback (most recent call last):
  File "/home/circleci/project/test/test_models.py", line 796, in check_out
    _assert_expected(output, model_name, prec=prec)
  File "/home/circleci/project/test/test_models.py", line 124, in _assert_expected
    torch.testing.assert_close(output, expected, rtol=rtol, atol=atol, check_dtype=False, check_device=False)
  File "/home/circleci/project/env/lib/python3.8/site-packages/torch/testing/_comparison.py", line 1342, in assert_close
    assert_equal(
  File "/home/circleci/project/env/lib/python3.8/site-packages/torch/testing/_comparison.py", line 1093, in assert_equal
    raise error_metas[0].to_error(msg)
AssertionError: The values for attribute 'shape' do not match: torch.Size([15, 4]) != torch.Size([14, 4]).

This clearly shows that minor differences on the kernel implementations of PyTorch actually lead to slightly different results across platforms / hardware. I was really hoping that using real data and real weights would reduce the issues but that doesn't seem to be the case. We should definitely try increasing the nms score to lower the number of low-quality bboxes and potentially focus on the top X boxes (as you proposed offline) to see if you get more stable results. If we can't fix the flakiness this way, we might need to review the testing strategy. Happy to chat more.

datumbox avatar Sep 20 '22 09:09 datumbox

NOTE: I also include the flaky autocast on the commit: Fix increase score_thresh for FCOS

YosuaMichael avatar Sep 20 '22 17:09 YosuaMichael