deepdetect icon indicating copy to clipboard operation
deepdetect copied to clipboard

Different prediction with tensorrt on refinedet model for the version v0.18.0

Open YaYaB opened this issue 3 years ago • 3 comments

pred_trt_refinedet_issue.zip

Configuration

  • Version of DeepDetect:
    • [ ] Locally compiled on:
      • [ ] Ubuntu 18.04 LTS
      • [ ] Other:
    • [ ] Docker CPU
    • [X] Docker GPU
    • [ ] Amazon AMI
  • Commit (shown by the server when starting): 23bd913ac180b56eddbf90c71d1f2e8bc2310c54

Your question / the problem you're facing:

I am observing weird predictions (with tensorrt and a refinedet model) associated to the last version of DeepDetect. The predictions seem really off.

I have created a script to replicate. It will launch predictions on dd's version from v0.15.0 to v0.18.0 with and without tensorrt. Then it dumps the predictions and a hash is computed on each prediction file (we keep only the predicions' list). We observe that the v0.18.0 trt is not consistent with its caffe version or with the previous trt models.

Please fill in the script the following env variables and make sure that you have a gpu available for testing. BASE_PATH=TODO LOGGING_FOLDER=TODO

and then simply launch the script

bash  pred_trt_refinedet_issue.sh

You should get the following output at then end (all the docker logs are not shown here):

Here we compute the sha256sum of the predictions obtained.
For the caffe models nothing changes however we observe differences for the trt model of the last version of dd v0.18.0.
Compare deepdetect_gpu
PATH_LOGS/prediction_deepdetect_gpu_v0.15.0.json: 9e056b235be08f7245bdd324ac8ca756c41353771fcb3004df2f6b6347326d63  -
PATH_LOGS/prediction_deepdetect_gpu_v0.16.0.json: 9e056b235be08f7245bdd324ac8ca756c41353771fcb3004df2f6b6347326d63  -
PATH_LOGS/prediction_deepdetect_gpu_v0.17.0.json: 9e056b235be08f7245bdd324ac8ca756c41353771fcb3004df2f6b6347326d63  -
PATH_LOGS/prediction_deepdetect_gpu_v0.18.0.json: 9e056b235be08f7245bdd324ac8ca756c41353771fcb3004df2f6b6347326d63  -

Compare deepdetect_gpu_tensorrt
PATH_LOGS/prediction_deepdetect_gpu_tensorrt_v0.15.0.json: 51767470062ecba3d77e765c34bed6000cf175400d5ff59dda9b4727356f49b5  -
PATH_LOGS/prediction_deepdetect_gpu_tensorrt_v0.16.0.json: 51767470062ecba3d77e765c34bed6000cf175400d5ff59dda9b4727356f49b5  -
PATH_LOGS/prediction_deepdetect_gpu_tensorrt_v0.17.0.json: 51767470062ecba3d77e765c34bed6000cf175400d5ff59dda9b4727356f49b5  -
PATH_LOGS/prediction_deepdetect_gpu_tensorrt_v0.18.0.json: 1508b68447819ff281231ad5c757e88f4a651f50570115565438ac9fee88d566  -

Expected predictions
[
  {
    "classes": [
      {
        "last": true,
        "bbox": {
          "ymax": 350.2694091796875,
          "xmax": 745.9049682617188,
          "ymin": 108.38544464111328,
          "xmin": 528.0482788085938
        },
        "prob": 0.9999849796295166,
        "cat": "1"
      }
    ],
    "uri": "https://icour.fr/ELeveSeconde/ajout/yann_lecum_vidal/images/yann_LeCun.jpg"
  }
]

Anormal predictions for trt v0.18.0
[
  {
    "classes": [
      {
        "last": true,
        "bbox": {
          "ymax": 239.68505859375,
          "xmax": 425.599365234375,
          "ymin": 0,
          "xmin": 211.946044921875
        },
        "prob": 1,
        "cat": "1"
      }
    ],
    "uri": "https://icour.fr/ELeveSeconde/ajout/yann_lecum_vidal/images/yann_LeCun.jpg"
  }

YaYaB avatar Aug 10 '21 11:08 YaYaB

Hi there, and thank you for the bug report

we were finally able to fix this, here : https://github.com/jolibrain/deepdetect/pull/1329

this PR updates TRT dependency (to TENSORRT 8.0.x), and unfortunately, this version has a bug https://forums.developer.nvidia.com/t/build-engine-error-when-use-pointnet-like-structure-and-tensorrt-8-0-1-6/183569/6 that affects ssd models.

Hopefully it will be fixed in net TRT update, and everything should then go as it should

fantes avatar Aug 18 '21 07:08 fantes

Thanks a lot I'll try your fix! I could be a good idea to add unit tests based on expected values for different models predictions to catch those, no?

YaYaB avatar Aug 18 '21 08:08 YaYaB

Indeed we have a few tests (we need to add some more) but they are deactivated due to dependancies problems (compatibility between versions of tensorrt, tensorrt-oss, cudnn , ubuntu and correspnding docker images... ) Hopefully we will be able to integrate/activate them with TRT 8.x

fantes avatar Aug 18 '21 08:08 fantes