Nimrod Rak
Nimrod Rak
I am trying to build a pipelined inference server with a mainly python backend (it runs PyTorch models sometimes in the code itself). Originally I had the entire pipeline run...
# Ask a Question ### Question I am attempting to implement a custom op using CUDA kernels and started looking into existing guides and how-to's available. The simplest and easiest...
### Description ```shell I am trying to optimize T5-small inference using Fastertransformer. I am running on a single V100, I followed all the steps in `t5_guide.md` exactly and got a...