Kris Hung

Results 7 issues of Kris Hung

Related PRs: common: https://github.com/triton-inference-server/common/pull/67 backend: https://github.com/triton-inference-server/backend/pull/67 tensorrt_backend: https://github.com/triton-inference-server/tensorrt_backend/pull/44

- [x] Validate CUDA SHM region size during registration - [x] Add CUDA SHM registration tests - [x] Refactor tests for CUDA SHM and System SHM

### Describe the issue We are seeing a regression when using onnxruntime with the CUDA execution provider starting from version 1.14.1. Before version 1.14.1, there was no regression. We also...

ep:CUDA

Testing added for the specific use case of launching a separate thread for decoupled models. Fixed a small issue for the CI as well. PYBE: https://github.com/triton-inference-server/python_backend/pull/358

Tested with OV 2024.X. This PR should be merged after ORT upgraded to 1.18.