training_extensions Enable explain integration tests for "test_engine_from

Summary

How to test

Checklist

[ ] I have added unit tests to cover my changes.
[ ] I have added integration tests to cover my changes.
[ ] I have added e2e tests for validation.
[ ] I have added the description of my changes into CHANGELOG in my target branch (e.g., CHANGELOG in develop).
[ ] I have updated the documentation in my target branch accordingly (e.g., documentation in develop).
[ ] I have linked related issues.

License

[ ] I submit my code changes under the same Apache License that covers the project. Feel free to contact the maintainers if that's a concern.
[ ] I have updated the license header for each file (see an example below).

# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

Mar 19 '24 16:03 GalyaZalesskaya

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 64.18%. Comparing base (10f66e8) to head (c200873).

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #3164      +/-   ##
===========================================
- Coverage    64.18%   64.18%   -0.01%     
===========================================
  Files          182      182              
  Lines        15067    15067              
===========================================
- Hits          9671     9670       -1     
- Misses        5396     5397       +1

Flag	Coverage Δ
py310	`64.18% <ø> (-0.01%)`	:arrow_down:
py311	`64.18% <ø> (-0.01%)`	:arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Mar 19 '24 16:03 codecov[bot]

Here is the interesting behavior of integration tests that shows the impact of one test on another. Ideally, test instances should run independently so I'm afraid there's some deep reason in device settings for that.

After enabling test_engine_from_config for DETECTION models it passes, but it makes ATSS models train and predict in further tests to fail with an error message, showing that inputs and weights are not on the same device: RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.FloatTensor) should be the same.

Steps to reproduce:

Independent tests running: test_xai.py tests are passing

pytest tests/integration/api/test_xai.py --task=DETECTION

17 passed, 1 skipped, 208 warnings in 187.35s (0:03:07)

Running the pipeline including test_engine_from_config causes tests in test_xai.py to fail during model inference

pytest /home/gzalessk/code/training_extensions/tests/integration/api --task=DETECTION

FAILED tests/integration/api/test_engine_api.py::test_engine_from_tile_recipe[gpu-/home/gzalessk/code/training_extensions/src/otx/recipe/detection/atss_mobilenetv2_tile.yaml] - RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same
FAILED tests/integration/api/test_xai.py::test_forward_explain[gpu-/home/gzalessk/code/training_extensions/src/otx/recipe/detection/atss_mobilenetv2.yaml] - RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.FloatTensor) should be the same
FAILED tests/integration/api/test_xai.py::test_forward_explain[gpu-/home/gzalessk/code/training_extensions/src/otx/recipe/detection/atss_resnext101.yaml] - RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.FloatTensor) should be the same
FAILED tests/integration/api/test_xai.py::test_forward_explain[gpu-/home/gzalessk/code/training_extensions/src/otx/recipe/detection/atss_r50_fpn.yaml] - RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.FloatTensor) should be the same

4 failed, 21 passed, 1 skipped,

Such behavior happens only for ATSS (detection task) and Mask RCNN-based (instance segmentation task).

I think, it can be connected to the device and accelerator settings for inferences in test_engine_from_config. @harimkang Have you seen something like that?

Mar 19 '24 17:03 GalyaZalesskaya

Here is the interesting behavior of integration tests that shows the impact of one test on another. Ideally, test instances should run independently so I'm afraid there's some deep reason in device settings for that.

After enabling test_engine_from_config for DETECTION models it passes, but it makes ATSS models train and predict in further tests to fail with an error message, showing that inputs and weights are not on the same device: RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.FloatTensor) should be the same.

Steps to reproduce:

Independent tests running: test_xai.py tests are passing
pytest tests/integration/api/test_xai.py --task=DETECTION

17 passed, 1 skipped, 208 warnings in 187.35s (0:03:07) 
Running the pipeline including test_engine_from_config causes tests in test_xai.py to fail during model inference
pytest /home/gzalessk/code/training_extensions/tests/integration/api --task=DETECTION

FAILED tests/integration/api/test_engine_api.py::test_engine_from_tile_recipe[gpu-/home/gzalessk/code/training_extensions/src/otx/recipe/detection/atss_mobilenetv2_tile.yaml] - RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same
FAILED tests/integration/api/test_xai.py::test_forward_explain[gpu-/home/gzalessk/code/training_extensions/src/otx/recipe/detection/atss_mobilenetv2.yaml] - RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.FloatTensor) should be the same
FAILED tests/integration/api/test_xai.py::test_forward_explain[gpu-/home/gzalessk/code/training_extensions/src/otx/recipe/detection/atss_resnext101.yaml] - RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.FloatTensor) should be the same
FAILED tests/integration/api/test_xai.py::test_forward_explain[gpu-/home/gzalessk/code/training_extensions/src/otx/recipe/detection/atss_r50_fpn.yaml] - RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.FloatTensor) should be the same

4 failed, 21 passed, 1 skipped, 
Such behavior happens only for ATSS (detection task) and Mask RCNN-based (instance segmentation task).

I think, it can be connected to the device and accelerator settings for inferences in test_engine_from_config. @harimkang Have you seen something like that?

@eugene123tw Hi. When apply this change to the latest commit, it seems to affect other tests as Galina said. Currently the tiling related tests are not working, could you please take a look?

Mar 20 '24 01:03 harimkang

@harimkang @eugene123tw It seems that tiling tests are failing for the same reason as test_xai.py:

they are running after test_engine_from_config
they are failing during engine.train with the same error that RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.FloatTensor) should be the same
commenting out tiling tests doesn't fix the overall problem that one test affects the other.

So it seems to me that the failure of the tiling tests is the consequence or problems with test_engine_from_config, not the root cause

Mar 20 '24 10:03 GalyaZalesskaya

@GalyaZalesskaya It appears that the device configuration in the explainable model is not properly set up. The inputs indicate CPU device, but a check within _forward_explain_detection shows cuda device.

A straightforward debugging step is to add an assertion such as assert inputs.device == next(self.buffers()).device. To address this issue directly, you can change the model's device using self.to(inputs.device). However, I recommend not to alter device during forward pass. Unfortunately, I can't provide a better solution but it's likely related to how the explainable model patches the models or the sequence in which the models are patched, leading to device mismatches.

Mar 21 '24 16:03 eugene123tw

Enable explain integration tests for "test_engine_from_config"

Summary

How to test

Checklist

License

Codecov Report