pytorch_backend
pytorch_backend copied to clipboard
The Triton backend for the PyTorch TorchScript models.
- [x] basic support - [x] test with dict inputs - [x] test with list inputs - [x] test default method name ("forward") - [x] test custom method name -...
We've occasionally had issues when multiple models that use cuDNN, (we sometimes see `CUDNN_INTERNAL_ERROR` and sometimes GPU memory will spike when running a kernel from cuDNN) so have found it...
This PR permits setting the a max_gpu_fraction for the pytorch backend. Pytorch allows setting the max gpu fraction through the [`CUDACachingAllocator`](https://github.com/pytorch/pytorch/blob/main/c10/cuda/CUDACachingAllocator.h) . The user of the pytorch_backend can set the...
When INFERENCE_MODE is set to false, the model still runs in no_grad mode. Is that intentional? This prevents the serving of models that requires gradients at inference time, such as...
Updating "github/actions" to mitigate issue with conflict `pre-commit` vs `setup-python`
This change breaks the monolithic src/libtorch.cc into multiple files, with a modern separation of classes into separate header and code files. No actual code changes were made besides the separation...
Relates https://github.com/triton-inference-server/server/issues/7853 Update https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#datatypes after this is done
Hi Team, We are working on adding Power(ppc64le) support to triton-inference-server with pytorch_backend. We were able to build pytorch_backend for CPU using the instructions mentioned here: https://github.com/triton-inference-server/pytorch_backend?tab=readme-ov-file#build-the-pytorch-backend-with-custom-pytorch. Please let us...
This PR introduces a new model configuration parameter, `ENABLE_DETERMINISTIC_ALGORITHMS`, to control whether PyTorch runs with deterministic algorithms enabled. * Adds `enable_deterministic_algorithms_` flag in `ModelState` * Parses the flag from model...