server
server copied to clipboard
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Related PRs: common: https://github.com/triton-inference-server/common/pull/67 backend: https://github.com/triton-inference-server/backend/pull/67 tensorrt_backend: https://github.com/triton-inference-server/tensorrt_backend/pull/44
I looked at the code and found that OpenVino Backend calls Infer_request_.infer (), which is synchronous mode, can we support asyncchronous mode?
**Description** We have an app that periodically (every 15 seconds) queries `triton-server` for liveness (`ServerLive` request), readiness (`ServerReady` request) and models info (`RepositoryIndex` request). For the past weeks, every 1...
**Description** Context: Trying to load a tensorflow saved model which has multiple signature definitions. Problem: The signature definition selected is not used. This happens randomly, when loading the container multiple...
Refactor L0_infer so future in-process APIs can use similar model generation scheme
How to run triton without docker on windows10? Is there any guidelines?
**Description** Thanks for this remarkable work, i deploy model with a variable execpt input tesnor. So i wanna to send this variable via query_params during each infer request. But i...
**Description** > I have a model built on tensorflow v1.15 (cuda=10.0, cudnn=7.4.1). I would like to register this model to current triton server (v22.03), the default CUDA version is 11.6....
**Is your feature request related to a problem? Please describe.** We have a need to do inference using multiple models on same data. Data is streamed and real time performance...