systems icon indicating copy to clipboard operation
systems copied to clipboard

Update operators keep tensors on GPU between ops (where possible)

Open karlhigley opened this issue 2 years ago • 1 comments

  • [ ] Create a numpy/cupy dispatch mechanism (like pandas/cudf in NVT)
  • [ ] Apply DLpack to pass GPU tensors from Python back-end to other models
  • [ ] Update FilterCandidates
  • [ ] Update SoftmaxSampling
  • [ ] Update Faiss and Feast ops to convert to GPU?

karlhigley avatar Mar 02 '22 20:03 karlhigley

Depending on how this turns out, we may or may not find it worthwhile to add a graph optimizer to condense multiple operators into a single TritonPythonModel. It would still help us avoid the scheduling overhead associated with passing requests between models, but it might not be a big boost if it combining operators no longer helps us avoid GPU-CPU roundtrip conversions.

karlhigley avatar Mar 02 '22 21:03 karlhigley