Kris Hung issues

Results 7 issues of


                                            Kris Hung

Increase SERVER_TIMEOUT for L0_infer_valgrind

Related PRs: common: https://github.com/triton-inference-server/common/pull/67 backend: https://github.com/triton-inference-server/backend/pull/67 tensorrt_backend: https://github.com/triton-inference-server/tensorrt_backend/pull/44

Validate CUDA SHM region registration size

- [x] Validate CUDA SHM region size during registration - [x] Add CUDA SHM registration tests - [x] Refactor tests for CUDA SHM and System SHM

Clean up ResponseFactory when a final complete flag is set

[Performance] Regression observed when using CUDA execution provider

### Describe the issue We are seeing a regression when using onnxruntime with the CUDA execution provider starting from version 1.14.1. Before version 1.14.1, there was no regression. We also...

ep:CUDA

Add testing for decoupled model use case

Testing added for the specific use case of launching a separate thread for decoupled models. Fixed a small issue for the CI as well. PYBE: https://github.com/triton-inference-server/python_backend/pull/358

Remove openvino hard-coded versioning in onnxruntime backend build

Tested with OV 2024.X. This PR should be merged after ORT upgraded to 1.18.