InsightFace-REST does force onnx drop GPU support?

trafficstars

Hi,

When I force to use onnx I saw model is not using gpu. using only cpu. can it be configurable ?

Jan 03 '22 10:01 MyraBaba

Docker images are built with CPU version of onnxruntime. It's intended use case is a fallback when no GPU available. You can install onnxruntime-gpu, though in it's latest versions you also have to provide GPU executor provider as argument to onnxruntime.Session. But I highly recommend using TRT for GPU since it's faster than onnxruntime and also there are some optimizations for image preprocessing with TRT backend

Jan 03 '22 11:01 SthPhoenix

is there any speed / accuracy difference between trt and onnx ?

It would be good if we change trt to onnx in deploy.sh (in gpu version) it automatically run gpu (onnxruntime-gpu).

Jan 03 '22 14:01 MyraBaba

is there any speed / accuracy difference between trt and onnx ?

It would be good if we change trt to onnx in deploy.sh (in gpu version) it automatically run gpu (onnxruntime-gpu).

TRT is significantly faster especially with fp16 inference (force_fp16=True) on GPUs supporting it. There is some accuracy degradation, but embeddings computed with TRT and onnxruntime are usually 0.99 similar.

It's trivial to add support for onnxruntime-gpu, but I'm not sure if it's actually useful, since TRT performs much better, and as I said before there are optimizations in IFR code for TRT inference.

Jan 03 '22 14:01 SthPhoenix

Same accuracy important sometime. If I remove oenxruntime and add oenxruntime-gpu to pip and also change art to onyx is ok for deploy_trt ?

On 3 Jan 2022, at 17:31, SthPhoenix @.***> wrote:

is there any speed / accuracy difference between trt and onnx ?

It would be good if we change trt to onnx in deploy.sh (in gpu version) it automatically run gpu (onnxruntime-gpu).

TRT is significantly faster especially with fp16 inference (force_fp16=True) on GPUs supporting it. There is some accuracy degradation, but embeddings computed with TRT and onnxruntime are usually 0.99 similar.

It's trivial to add support for onnxruntime-gpu, but I'm not sure if it's actually useful, since TRT performs much better, and as I said before there are optimizations in IFR code for TRT inference.

— Reply to this email directly, view it on GitHub https://github.com/SthPhoenix/InsightFace-REST/issues/66#issuecomment-1004132780, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEFRZH4JPHPKS6OYOCUF3RDUUGXNVANCNFSM5LE5HNRA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you authored the thread.

Jan 03 '22 15:01 MyraBaba

You should also provide cuda execution provider argument in latest versions of onnxruntime

Jan 03 '22 15:01 SthPhoenix

To where _? Onyx_backend?

On 3 Jan 2022, at 18:36, SthPhoenix @.***> wrote:

You should also provide cuda execution provider https://onnxruntime.ai/docs/execution-providers/ argument in latest versions of onnxruntime

— Reply to this email directly, view it on GitHub https://github.com/SthPhoenix/InsightFace-REST/issues/66#issuecomment-1004174004, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEFRZH5P2G2BR7CY2QJEQWLUUG66HANCNFSM5LE5HNRA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you authored the thread.

Jan 03 '22 15:01 MyraBaba

To all lines with onnxruntime.InferenceSession in onnxrt_backend.py

Jan 03 '22 15:01 SthPhoenix

done.

Onnx-gpu: 34 img/sec trt : 44 img/sec

On 3 Jan 2022, at 18:50, SthPhoenix @.***> wrote:

To all lines with onnxruntime.InferenceSession in onnxrt_backend.py

— Reply to this email directly, view it on GitHub https://github.com/SthPhoenix/InsightFace-REST/issues/66#issuecomment-1004182559, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEFRZH42HFLMON5AFPOMXQTUUHATDANCNFSM5LE5HNRA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you authored the thread.

Jan 03 '22 16:01 MyraBaba

That's pretty slow, what GPU and models parameter have you used?

Jan 03 '22 16:01 SthPhoenix

2080txi

Default deploy parameters. 1 worker

On 3 Jan 2022, at 19:16, SthPhoenix @.***> wrote:

That's pretty slow, what GPU and models parameter have you used?

— Reply to this email directly, view it on GitHub https://github.com/SthPhoenix/InsightFace-REST/issues/66#issuecomment-1004199515, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEFRZHZMRLYWUTLS6XQSL3TUUHDUHANCNFSM5LE5HNRA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you authored the thread.

Jan 03 '22 17:01 MyraBaba

Try enabling force_fp16 then, I'm getting around 145-150 img/sec with one worker and 10 client threads with fp16 enabled on rtx2080 super for Stallone.jpg

Jan 03 '22 18:01 SthPhoenix

InsightFace-REST InsightFace-REST copied to clipboard

does force onnx drop GPU support?

InsightFace-REST
InsightFace-REST copied to clipboard