serving
serving copied to clipboard
Sort input/output in PreProcessPrediction
In direct_session.cc https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/common_runtime/direct_session.cc#L1514, it always emplaces key to executors_, then a lot of keys are added to map, which leads to a lot of memory usage.
If 10 input tensors, then there are 10! = 3,628,800 kinds of keys, memory usage is very big.
// See if we already have the executors for this run.
{
mutex_lock l(executor_lock_);
auto it = executors_.find(sorted_key);
if (it != executors_.end()) {
*executors_and_keys = it->second.get();
// Insert this under the original key.
executors_.emplace(key, it->second);
return Status::OK();
}
}
please check attached file. serving_nommap.0042.heap.base0007.pdf
Also check issue https://github.com/tensorflow/serving/issues/1215
I'm not sure fix TF code or TF serving code would be better, so I submitted another PR https://github.com/tensorflow/tensorflow/pull/39743, please help check.
Thank you. We used the code from this pr, and solved the out-memory problem in our production system.
Thanks for the bug report and the PR.
I think this is best fixed in (TF) direct_session (tensorflow/tensorflow#39743) than in TF serving.
Though I am wondering, why do you have so many keys in your setup. If the input/output ordering is kept consistent across requests, we should not have these many keys. no?
Thanks for the bug report and the PR.
I think this is best fixed in (TF) direct_session (tensorflow/tensorflow#39743) than in TF serving.
Though I am wondering, why do you have so many keys in your setup. If the input/output ordering is kept consistent across requests, we should not have these many keys. no?
I think I didn't make it clear, the root cause is the inputs map in PredictRequest is not an order map.
message PredictRequest {
// Model Specification. If version is not specified, will use the latest
// (numerical) version.
ModelSpec model_spec = 1;
// Input tensors.
// Names of input tensor are alias names. The mapping from aliases to real
// input tensor names is stored in the SavedModel export as a prediction
// SignatureDef under the 'inputs' field.
map<string, TensorProto> inputs = 2;
// Output filter.
// Names specified are alias names. The mapping from aliases to real output
// tensor names is stored in the SavedModel export as a prediction
// SignatureDef under the 'outputs' field.
// Only tensors specified here will be run/fetched and returned, with the
// exception that when none is specified, all tensors specified in the
// named signature will be run/fetched and returned.
repeated string output_filter = 3;
}
Then the inputs in PredictRequest could be any order in function RunPredict, even the reqeust sent in GRPC message is same.
Status RunPredict(const RunOptions& run_options,
const MetaGraphDef& meta_graph_def,
const optional<int64>& servable_version, Session* session,
const PredictRequest& request, PredictResponse* response) {
// Validate signatures.
const string signature_name = request.model_spec().signature_name().empty()
? kDefaultServingSignatureDefKey
: request.model_spec().signature_name();
auto iter = meta_graph_def.signature_def().find(signature_name);
if (iter == meta_graph_def.signature_def().end()) {
return errors::FailedPrecondition(strings::StrCat(
"Serving signature key \"", signature_name, "\" not found."));
}
SignatureDef signature = iter->second;
MakeModelSpec(request.model_spec().name(), signature_name, servable_version,
response->mutable_model_spec());
std::vector<std::pair<string, Tensor>> input_tensors;
std::vector<string> output_tensor_names;
std::vector<string> output_tensor_aliases;
TF_RETURN_IF_ERROR(PreProcessPrediction(signature, request, &input_tensors,
&output_tensor_names,
&output_tensor_aliases));
For example:
map<string, TensorProto> inputs = 2;
There are 3 inputs, "feature1", "feature2" and "feature3" in request.inputs , but the iteration order could be different even send same GRPC message.
for (auto& input : request.inputs()) {
const string& alias = input.first;
std::cout << alias << std::endl;
}