vision
vision copied to clipboard
[DONT MERGE] Test detection model with real weight and image
I did experiment and try to test detection model with real weight and image. This PR is meant for experiment on the CI and not meant to be merged.
@YosuaMichael from this, I see:
_________________ test_detection_model[cuda-fcos_resnet50_fpn] _________________
Traceback (most recent call last):
File "/home/circleci/project/test/test_models.py", line 796, in check_out
_assert_expected(output, model_name, prec=prec)
File "/home/circleci/project/test/test_models.py", line 124, in _assert_expected
torch.testing.assert_close(output, expected, rtol=rtol, atol=atol, check_dtype=False, check_device=False)
File "/home/circleci/project/env/lib/python3.8/site-packages/torch/testing/_comparison.py", line 1342, in assert_close
assert_equal(
File "/home/circleci/project/env/lib/python3.8/site-packages/torch/testing/_comparison.py", line 1093, in assert_equal
raise error_metas[0].to_error(msg)
AssertionError: The values for attribute 'shape' do not match: torch.Size([15, 4]) != torch.Size([14, 4]).
This clearly shows that minor differences on the kernel implementations of PyTorch actually lead to slightly different results across platforms / hardware. I was really hoping that using real data and real weights would reduce the issues but that doesn't seem to be the case. We should definitely try increasing the nms score to lower the number of low-quality bboxes and potentially focus on the top X boxes (as you proposed offline) to see if you get more stable results. If we can't fix the flakiness this way, we might need to review the testing strategy. Happy to chat more.
NOTE: I also include the flaky autocast on the commit: Fix increase score_thresh for FCOS