systems Update operators keep tensors on GPU between ops (where possible)

Update operators keep tensors on GPU between ops (where possible)

Open karlhigley opened this issue 2 years ago • 1 comments

[ ] Create a numpy/cupy dispatch mechanism (like pandas/cudf in NVT)
[ ] Apply DLpack to pass GPU tensors from Python back-end to other models
[ ] Update FilterCandidates
[ ] Update SoftmaxSampling
[ ] Update Faiss and Feast ops to convert to GPU?

Mar 02 '22 20:03 karlhigley

Depending on how this turns out, we may or may not find it worthwhile to add a graph optimizer to condense multiple operators into a single TritonPythonModel. It would still help us avoid the scheduling overhead associated with passing requests between models, but it might not be a big boost if it combining operators no longer helps us avoid GPU-CPU roundtrip conversions.

Mar 02 '22 21:03 karlhigley

systems systems copied to clipboard

Update operators keep tensors on GPU between ops (where possible)

systems
systems copied to clipboard