server
server copied to clipboard
support decoupled mode in perf_analyzer
when using perf_analyzer to analyze a python decoupled model like triton-decoupled using command below
perf_analyzer -i grpc --streaming -m repeat --concurrency-range 1:2 -vv --input-data repeat_data.json
repeat_data.json
{
"data" :
[
{
"IN" : [5]
}
]
}
which 5
means how many response will server (decouple mode) will send to client
the results will show error
No valid requests recorded within time interval. Please use a larger time window.
it seems like perf_analyzer will send a request when it get a response, but in decouple mode, perf_analyzer should not send request again until it get final response from decouple mode
@Jackiexiao Thanks for your feature request. Perf Analyzer has limited support for decoupled mode by measuring the time between the request and first response. However, as you mentioned in the issue, it is not suitable for all the different combinations that can be expressed using the decoupled mode. I'll mark this ticket as an enhancement.