serving Why TF Serving using one CUDA Compute Stream

Why TF Serving using one CUDA Compute Stream

Open ndeep27 opened this issue 1 year ago • 1 comments

Trying to understand why TF uses one CUDA compute stream? Is there a metric which shows if ops are waiting to be scheduled on that one compute stream? I want to understand if the ops are waiting in high QPS scenarios

May 06 '24 22:05 ndeep27

@ndeep27, Looks like this is not an issue from Tensorflow Serving side. This question is better asked on TensorFlow Forum since it is not a bug or feature request. There is also a larger community that reads questions there. Thank you!

May 08 '24 08:05 singhniraj08

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

May 16 '24 01:05 github-actions[bot]

This issue was closed due to lack of activity after being marked stale for past 7 days.

May 24 '24 01:05 github-actions[bot]

Are you satisfied with the resolution of your issue? Yes No

May 24 '24 01:05 google-ml-butler[bot]

serving serving copied to clipboard

Why TF Serving using one CUDA Compute Stream

serving
serving copied to clipboard