GuanLuo
GuanLuo
core: https://github.com/triton-inference-server/core/pull/109 example: if model has below config and build with ENABLE_GPU=OFF: ``` name: "add_sub" backend: "python" input [ ... ] output [ ... ] # not providing instance group...
**Description** The generation looks for ["CUDNN_VERSION" environment variable on host system](https://github.com/triton-inference-server/onnxruntime_backend/blob/main/tools/gen_ort_dockerfile.py#L429-L435) at first, and later use the [version in docker image](https://github.com/triton-inference-server/onnxruntime_backend/blob/main/tools/gen_ort_dockerfile.py#L94-L98). CUDNN ships with the docker image so it may...
# Ask a Question ### Question I was toggling the node parameters for testing and I noticed that the ONNX checker doesn't complain on the following model whose output has...
Follow up on https://github.com/triton-inference-server/core/pull/229 For custom backend, one may call send response in the following style, even for "non-decoupled" model ``` TRITONBACKEND_ResponseSend(response, 0, nullptr /* success */); TRITONBACKEND_ResponseFactorySendFlags(factory, TRITONSERVER_RESPONSE_COMPLETE_FINAL); ```...
This PR provides the low level binding for Python user to interact with Triton library within the same process, however, this binding is not intended for Python user to use...
`FIXME` are the sections that further discussion is desired. The wrapper is restricted that a lot of interaction with in-process API is pre-defined (i.e. how to handle released `TRITONSERVER_Request` and...
@wphicks this PR is source from main branch of rapids-triton repo, I removed the [`squeeze_output`](https://github.com/rapidsai/rapids-triton/blob/main/cpp/include/rapids_triton/model/shared_state.hpp#L74-L81) as it seems to be just for backward compatibility. Other than that, all changes are...
#### What does the PR do? Add unit test to constraint the behavior of shared memory utilities #### Checklist - [x] PR title reflects the change and is of format...