FaceBoxes-tensorflow icon indicating copy to clipboard operation
FaceBoxes-tensorflow copied to clipboard

why cannot obtain right pb file by save.py and the create_pb.py?

Open zmhik opened this issue 7 years ago • 4 comments

I can obtain the pb file by save.py and the create_pb.py with the model.ckpt-240000 (downloading). But, I can not get the right prediction results by try_detector.ipynb and face_detector.py with the pb file. However, I can get the right prediction results with the model-step-240000.pb (downloading). So, when obtain the pb file by save.py and the create_pb.py, any tricks or special operations need to be paid attention to? Thanks!

zmhik avatar Aug 08 '18 08:08 zmhik

Hi. It is weird. Here is how I do it:

  1. Place downloaded checkpoint into models/run00/.
  2. Run rm export/ -rf; python save.py.
  3. Run python create_pb.py -s export/run00/SOMENUMBER/
  4. The result is model.pb. Use it.

TropComplique avatar Aug 24 '18 11:08 TropComplique

@TropComplique I am trying to freeze your graph again in TF 1.20. When I follow your instructions given above, save.py doesn't work. I created a new directory called models/run00 and placed the downloaded checkpoint into the directory. I then ran python save.py from the top directory. This is my error

(/media/az/Secondary/Work/face-rec/src/fr-env) az@ryzen:/media/az/Secondary/Work/face-rec/src/code/FaceBoxes-tensorflow$ python save.py 
INFO:tensorflow:Using config: {'_model_dir': 'models/run00', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': gpu_options {
  visible_device_list: "0"
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fa77f2b7208>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
Traceback (most recent call last):
  File "save.py", line 40, in <module>
    OUTPUT_FOLDER, serving_input_receiver_fn
  File "/media/az/Secondary/Work/face-rec/src/fr-env/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 663, in export_savedmodel
    mode=model_fn_lib.ModeKeys.PREDICT)
  File "/media/az/Secondary/Work/face-rec/src/fr-env/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 789, in _export_saved_model_for_mode
    strip_default_attrs=strip_default_attrs)
  File "/media/az/Secondary/Work/face-rec/src/fr-env/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 876, in _export_all_saved_models
    self._model_dir)
  File "/media/az/Secondary/Work/face-rec/src/fr-env/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_management.py", line 331, in latest_checkpoint
    ckpt = get_checkpoint_state(checkpoint_dir, latest_filename)
  File "/media/az/Secondary/Work/face-rec/src/fr-env/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_management.py", line 273, in get_checkpoint_state
    + checkpoint_dir)
ValueError: Invalid checkpoint state loaded from models/run00

Do i need to place a prototxt file inside the checkpoint directory?

For clarity this is my directory structure

--save.py
--create_pb.py
--...
--models/
----run00/
------checkpoint/
--------model.ckpt-240000.data-00000-of-00001
--------model.ckpt-240000.index
--------model.ckpt-240000.meta

azmathmoosa avatar Jan 22 '19 02:01 azmathmoosa

Success! I had to place a new file called checkpoint with the following content

model_checkpoint_path: "model.ckpt-240000"
all_model_checkpoint_paths: "model.ckpt-240000"

My directory structure is now

--save.py
--create_pb.py
--...
--models/
----run00/
------checkpoint
------model.ckpt-240000.data-00000-of-00001
------model.ckpt-240000.index
------model.ckpt-240000.meta

azmathmoosa avatar Jan 22 '19 02:01 azmathmoosa

I get similar issue if additional checkpoint file mentioned by @azmathmoosa is not specified using tensorflow 1.12.0. , looks like it related to tensorflow version, with which tensorflow version master should work?

INFO:tensorflow:Using config: {'_model_dir': 'models/run00', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': gpu_options {
  visible_device_list: "0"
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x12b0c7240>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
Traceback (most recent call last):
  File "save.py", line 43, in <module>
    OUTPUT_FOLDER, serving_input_receiver_fn
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 663, in export_savedmodel
    mode=model_fn_lib.ModeKeys.PREDICT)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 789, in _export_saved_model_for_mode
    strip_default_attrs=strip_default_attrs)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 878, in _export_all_saved_models
    raise ValueError("Couldn't find trained model at %s." % self._model_dir)
ValueError: Couldn't find trained model at models/run00.

With fix I get output like:

python save.py

INFO:tensorflow:Using config: {'_model_dir': 'models/run00', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': gpu_options {
  visible_device_list: "0"
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x128c2e240>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Signatures INCLUDED in export for Classify: None
INFO:tensorflow:Signatures INCLUDED in export for Regress: None
INFO:tensorflow:Signatures INCLUDED in export for Predict: ['outputs', 'serving_default']
INFO:tensorflow:Signatures INCLUDED in export for Train: None
INFO:tensorflow:Signatures INCLUDED in export for Eval: None
2019-03-14 13:39:08.654650: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
INFO:tensorflow:Restoring parameters from models/run00/model.ckpt-240000
WARNING:tensorflow:From /usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py:1044: calling SavedModelBuilder.add_meta_graph_and_variables (from tensorflow.python.saved_model.builder_impl) with legacy_init_op is deprecated and will be removed in a future version.
Instructions for updating:
Pass your op to the equivalent parameter main_op instead.
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: export/run00/temp-b'1552559947'/saved_model.pb

I wonder why this custom convertion pipeline is used? i.e. why not just direct checkpoint to .pb convertion?

mrgloom avatar Mar 14 '19 10:03 mrgloom