serving Sort input/output in PreProcessPrediction

In direct_session.cc https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/common_runtime/direct_session.cc#L1514, it always emplaces key to executors_, then a lot of keys are added to map, which leads to a lot of memory usage.

If 10 input tensors, then there are 10! = 3,628,800‬ kinds of keys, memory usage is very big.

  // See if we already have the executors for this run.
  {
    mutex_lock l(executor_lock_);
    auto it = executors_.find(sorted_key);
    if (it != executors_.end()) {
      *executors_and_keys = it->second.get();

      // Insert this under the original key.  
      executors_.emplace(key, it->second); 
      return Status::OK();
    }
  }

please check attached file. serving_nommap.0042.heap.base0007.pdf

Also check issue https://github.com/tensorflow/serving/issues/1215

I'm not sure fix TF code or TF serving code would be better, so I submitted another PR https://github.com/tensorflow/tensorflow/pull/39743, please help check.

May 22 '20 04:05 zhjunqin

Thank you. We used the code from this pr, and solved the out-memory problem in our production system.

May 26 '20 03:05 algorithmdog

Thanks for the bug report and the PR.

I think this is best fixed in (TF) direct_session (tensorflow/tensorflow#39743) than in TF serving.

Though I am wondering, why do you have so many keys in your setup. If the input/output ordering is kept consistent across requests, we should not have these many keys. no?

May 29 '20 18:05 netfs

Thanks for the bug report and the PR.

I think this is best fixed in (TF) direct_session (tensorflow/tensorflow#39743) than in TF serving.

Though I am wondering, why do you have so many keys in your setup. If the input/output ordering is kept consistent across requests, we should not have these many keys. no?

I think I didn't make it clear, the root cause is the inputs map in PredictRequest is not an order map.

message PredictRequest {
  // Model Specification. If version is not specified, will use the latest
  // (numerical) version.
  ModelSpec model_spec = 1;

  // Input tensors.
  // Names of input tensor are alias names. The mapping from aliases to real
  // input tensor names is stored in the SavedModel export as a prediction
  // SignatureDef under the 'inputs' field.
  map<string, TensorProto> inputs = 2;

  // Output filter.
  // Names specified are alias names. The mapping from aliases to real output
  // tensor names is stored in the SavedModel export as a prediction
  // SignatureDef under the 'outputs' field.
  // Only tensors specified here will be run/fetched and returned, with the
  // exception that when none is specified, all tensors specified in the
  // named signature will be run/fetched and returned.
  repeated string output_filter = 3;
}

Then the inputs in PredictRequest could be any order in function RunPredict, even the reqeust sent in GRPC message is same.

Status RunPredict(const RunOptions& run_options,
                  const MetaGraphDef& meta_graph_def,
                  const optional<int64>& servable_version, Session* session,
                  const PredictRequest& request, PredictResponse* response) {
  // Validate signatures.
  const string signature_name = request.model_spec().signature_name().empty()
                                    ? kDefaultServingSignatureDefKey
                                    : request.model_spec().signature_name();
  auto iter = meta_graph_def.signature_def().find(signature_name);
  if (iter == meta_graph_def.signature_def().end()) {
    return errors::FailedPrecondition(strings::StrCat(
        "Serving signature key \"", signature_name, "\" not found."));
  }
  SignatureDef signature = iter->second;

  MakeModelSpec(request.model_spec().name(), signature_name, servable_version,
                response->mutable_model_spec());

  std::vector<std::pair<string, Tensor>> input_tensors;
  std::vector<string> output_tensor_names;
  std::vector<string> output_tensor_aliases;
  TF_RETURN_IF_ERROR(PreProcessPrediction(signature, request, &input_tensors,
                                          &output_tensor_names,
                                          &output_tensor_aliases));

May 30 '20 08:05 zhjunqin

For example:

map<string, TensorProto> inputs = 2;

There are 3 inputs, "feature1", "feature2" and "feature3" in request.inputs , but the iteration order could be different even send same GRPC message.

  for (auto& input : request.inputs()) {
    const string& alias = input.first;
    std::cout << alias << std::endl;
  }

May 30 '20 08:05 zhjunqin

serving serving copied to clipboard

Sort input/output in PreProcessPrediction

serving
serving copied to clipboard