serving
serving copied to clipboard
Why TF Serving using one CUDA Compute Stream
Trying to understand why TF uses one CUDA compute stream? Is there a metric which shows if ops are waiting to be scheduled on that one compute stream? I want to understand if the ops are waiting in high QPS scenarios
@ndeep27, Looks like this is not an issue from Tensorflow Serving side. This question is better asked on TensorFlow Forum since it is not a bug or feature request. There is also a larger community that reads questions there. Thank you!
This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.
This issue was closed due to lack of activity after being marked stale for past 7 days.