DeepRec icon indicating copy to clipboard operation
DeepRec copied to clipboard

【grpc++】env_->rendezvous_mgr->RecvLocalAsync failed, error msg is: [_Derived_]End of sequence

Open kpsc opened this issue 2 years ago • 3 comments

System information

  • Have I written custom code :
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
  • TensorFlow installed from (source or binary): DeepRec
  • TensorFlow version : tf1.15
  • Python version: python3.6

when i used grpc++ in estimator, i got the following error,but it still training, i don't know whether it is ok

image

config = tf.estimator.RunConfig( save_checkpoints_secs=10 * 60, keep_checkpoint_max=2, protocol='grpc++' ) model = tf.estimator.Estimator( model_fn=model_fn, params=model_params, model_dir=checkpoint, config=config ) eval_spec = tf.estimator.EvalSpec(...) train_spec = tf.estimator.TrainSpec(...) tf.estimator.train_and_evaluate(model, train_spec, eval_spec)

In the DeepRec-doc, I found that it seems there some problem with ori-estimator,but I bazel failed and don't know what's Estimator check like when using grpc++,in the deeprec last version whether we need to install estimaotr specially?

kpsc avatar Jul 19 '22 11:07 kpsc

"End of sequence" means the data was finished, in general, estimator handle the exception naturally. If you use 'MonitoredTrainingSession' API, it may encounter this log. Which estimator you installed, we offered a version in github: https://github.com/AlibabaPAI/estimator/tree/deeprec

shanshanpt avatar Jul 20 '22 07:07 shanshanpt

Thanks for your reply. And I have anthor question, when I used grpc++ in distributed training, it's slow than grpc, is there anything else about training set? In the network, I only used normal embedding with tensorflow

kpsc avatar Jul 25 '22 03:07 kpsc

There's list of tips to help you to tune the grpc++, follow the https://deeprec.readthedocs.io/zh/latest/GRPC%2B%2B.html

liutongxuan avatar Jul 25 '22 03:07 liutongxuan