Kris Hung
Kris Hung
Related PRs: common: https://github.com/triton-inference-server/common/pull/67 backend: https://github.com/triton-inference-server/backend/pull/67 tensorrt_backend: https://github.com/triton-inference-server/tensorrt_backend/pull/44
- [x] Validate CUDA SHM region size during registration - [x] Add CUDA SHM registration tests - [x] Refactor tests for CUDA SHM and System SHM
### Describe the issue We are seeing a regression when using onnxruntime with the CUDA execution provider starting from version 1.14.1. Before version 1.14.1, there was no regression. We also...
Testing added for the specific use case of launching a separate thread for decoupled models. Fixed a small issue for the CI as well. PYBE: https://github.com/triton-inference-server/python_backend/pull/358
Tested with OV 2024.X. This PR should be merged after ORT upgraded to 1.18.