InsightFace-REST
InsightFace-REST copied to clipboard
does force onnx drop GPU support?
Hi,
When I force to use onnx I saw model is not using gpu. using only cpu. can it be configurable ?
Docker images are built with CPU version of onnxruntime. It's intended use case is a fallback when no GPU available. You can install onnxruntime-gpu, though in it's latest versions you also have to provide GPU executor provider as argument to onnxruntime.Session. But I highly recommend using TRT for GPU since it's faster than onnxruntime and also there are some optimizations for image preprocessing with TRT backend
is there any speed / accuracy difference between trt and onnx ?
It would be good if we change trt to onnx in deploy.sh (in gpu version) it automatically run gpu (onnxruntime-gpu).
is there any speed / accuracy difference between trt and onnx ?
It would be good if we change trt to onnx in deploy.sh (in gpu version) it automatically run gpu (onnxruntime-gpu).
TRT is significantly faster especially with fp16 inference (force_fp16=True) on GPUs supporting it. There is some accuracy degradation, but embeddings computed with TRT and onnxruntime are usually 0.99 similar.
It's trivial to add support for onnxruntime-gpu, but I'm not sure if it's actually useful, since TRT performs much better, and as I said before there are optimizations in IFR code for TRT inference.
Same accuracy important sometime. If I remove oenxruntime and add oenxruntime-gpu to pip and also change art to onyx is ok for deploy_trt ?
On 3 Jan 2022, at 17:31, SthPhoenix @.***> wrote:
is there any speed / accuracy difference between trt and onnx ?
It would be good if we change trt to onnx in deploy.sh (in gpu version) it automatically run gpu (onnxruntime-gpu).
TRT is significantly faster especially with fp16 inference (force_fp16=True) on GPUs supporting it. There is some accuracy degradation, but embeddings computed with TRT and onnxruntime are usually 0.99 similar.
It's trivial to add support for onnxruntime-gpu, but I'm not sure if it's actually useful, since TRT performs much better, and as I said before there are optimizations in IFR code for TRT inference.
— Reply to this email directly, view it on GitHub https://github.com/SthPhoenix/InsightFace-REST/issues/66#issuecomment-1004132780, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEFRZH4JPHPKS6OYOCUF3RDUUGXNVANCNFSM5LE5HNRA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you authored the thread.
You should also provide cuda execution provider argument in latest versions of onnxruntime
To where _? Onyx_backend?
On 3 Jan 2022, at 18:36, SthPhoenix @.***> wrote:
You should also provide cuda execution provider https://onnxruntime.ai/docs/execution-providers/ argument in latest versions of onnxruntime
— Reply to this email directly, view it on GitHub https://github.com/SthPhoenix/InsightFace-REST/issues/66#issuecomment-1004174004, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEFRZH5P2G2BR7CY2QJEQWLUUG66HANCNFSM5LE5HNRA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you authored the thread.
To all lines with onnxruntime.InferenceSession in onnxrt_backend.py
done.
Onnx-gpu: 34 img/sec trt : 44 img/sec
On 3 Jan 2022, at 18:50, SthPhoenix @.***> wrote:
To all lines with onnxruntime.InferenceSession in onnxrt_backend.py
— Reply to this email directly, view it on GitHub https://github.com/SthPhoenix/InsightFace-REST/issues/66#issuecomment-1004182559, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEFRZH42HFLMON5AFPOMXQTUUHATDANCNFSM5LE5HNRA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you authored the thread.
That's pretty slow, what GPU and models parameter have you used?
2080txi
Default deploy parameters. 1 worker
On 3 Jan 2022, at 19:16, SthPhoenix @.***> wrote:
That's pretty slow, what GPU and models parameter have you used?
— Reply to this email directly, view it on GitHub https://github.com/SthPhoenix/InsightFace-REST/issues/66#issuecomment-1004199515, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEFRZHZMRLYWUTLS6XQSL3TUUHDUHANCNFSM5LE5HNRA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you authored the thread.
Try enabling force_fp16 then, I'm getting around 145-150 img/sec with one worker and 10 client threads with fp16 enabled on rtx2080 super for Stallone.jpg