tensorflow-face-detection icon indicating copy to clipboard operation
tensorflow-face-detection copied to clipboard

Checkpoint file

Open chrisrn opened this issue 5 years ago • 19 comments

Can you provide the checkpoint folder (including meta file)? It is common now in tensorflow to import meta graphs.

chrisrn avatar Sep 27 '18 07:09 chrisrn

can you share the checkpoint folder? Thank you

kli017 avatar Oct 22 '18 07:10 kli017

@chrisrn Do you have the checkpoint folder already?

kli017 avatar Oct 22 '18 09:10 kli017

Yes but it contains a more complex graph. But I can give you the code for converting a protobuf file into checkpoint. Inside a protobuf file all variables are converted to constants. So you can import the graph from protobuf, convert all constants to variables and export a checkpoint like that:

def protobuf_to_checkpoint_conversion(pb_model, ckpt_dir):

    graph = tf.Graph()
    with graph.as_default():
        od_graph_def = tf.GraphDef()
        with tf.gfile.GFile(pb_model, 'rb') as fid:
            serialized_graph = fid.read()
            od_graph_def.ParseFromString(serialized_graph)
            tf.import_graph_def(od_graph_def, name='')

    with graph.as_default():
        config = tf.ConfigProto()

        with tf.Session(graph=graph, config=config) as sess:

            constant_ops = [op for op in sess.graph.get_operations() if op.type == "Const"]
            params = []
            for constant_op in constant_ops:
                shape = constant_op.outputs[0].get_shape()
                var = tf.get_variable(constant_op.name, shape=shape)
                params.append(var)

            init = tf.global_variables_initializer()
            sess.run(init)

            saver = tf.train.Saver(var_list=params)
            ckpt_path = os.path.join(ckpt_dir, 'model.ckpt')
            saver.save(sess, ckpt_path)

chrisrn avatar Oct 22 '18 11:10 chrisrn

@chrisrn Thanks a lot. The code works well!

kli017 avatar Oct 23 '18 01:10 kli017

Thanks for the convert function, but when I fine-tuned from the ckpt with pipeline.config of ssd_mobilenet_v1_coco, tensorflow reports that there is no weight of (may tensors) in the fine-tuned ckpt. So can you attach your pipline.config?

Dongshengjiang avatar Nov 25 '18 13:11 Dongshengjiang

@Dongshengjiang Have you got the pipeline.config file?

yoyomolinas avatar Jan 16 '19 18:01 yoyomolinas

Not yet

                        蒋
                        
                        
                            
                                邮箱:[email protected]
                            
                    
                
            
        
    

签名由 网易邮箱大师 定制

On 01/17/2019 02:46, Yoel Molinas wrote: @Dongshengjiang Have you got the pipeline.config file?

—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread. {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/yeephycho/tensorflow-face-detection","title":"yeephycho/tensorflow-face-detection","subtitle":"GitHub repository","main_image_url":"https://github.githubassets.com/images/email/message_cards/header.png","avatar_image_url":"https://github.githubassets.com/images/email/message_cards/avatar.png","action":{"name":"Open in GitHub","url":"https://github.com/yeephycho/tensorflow-face-detection"}},"updates":{"snippets":[{"icon":"PERSON","message":"@yoyomolinas in #42: @Dongshengjiang Have you got the pipeline.config file? "}],"action":{"name":"View Issue","url":"https://github.com/yeephycho/tensorflow-face-detection/issues/42#issuecomment-454894170"}}} [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/yeephycho/tensorflow-face-detection/issues/42#issuecomment-454894170", "url": "https://github.com/yeephycho/tensorflow-face-detection/issues/42#issuecomment-454894170", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Dongshengjiang avatar Jan 17 '19 06:01 Dongshengjiang

@chrisrn Thanks for the conversion function. I realized that the conversion uses only a single graph to perform all loading and saving which causes new variables to have an extension of '_1' to their names. This causes several issues when attempting to load model from checkpoint files. I modified the function the following way to restore variables with the same names they were originally stored in the protobuf file.

def protobuf_to_checkpoint_conversion(pb_model, ckpt_dir):
    graph = tf.Graph()
    with graph.as_default():
        od_graph_def = tf.GraphDef()
        with tf.gfile.GFile(pb_model, 'rb') as fid:
            serialized_graph = fid.read()
            od_graph_def.ParseFromString(serialized_graph)
            tf.import_graph_def(od_graph_def,name='')

    graph2 = tf.Graph()
    with graph2.as_default():
        config = tf.ConfigProto()
        with tf.Session(graph=graph2, config=config) as sess:
            constant_ops = [op for op in graph.get_operations() if op.type == "Const"]
            params = []
            for constant_op in constant_ops:
                name = constant_op.name
                shape = constant_op.outputs[0].get_shape()
                var = tf.get_variable(name, shape=shape)
                params.append(var)

            init = tf.global_variables_initializer()
            sess.run(init)
            saver = tf.train.Saver(var_list=params)
            ckpt_path = os.path.join(ckpt_dir, 'model.ckpt')
            saver.save(sess, ckpt_path, global_step=1)
            

I am currently working on optimizing this face detector with TensorRT. I face some issues when exporting the model with object_detection.exporter.export_inference_graph from the object detection API. The error I specifically get when trying to export the frozen inference graph is this:

InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [6] rhs shape= [9] [[Node: save/Assign_2 = Assign[T=DT_FLOAT, _class=["loc:@BoxPredictor_0/ClassPredictor/biases"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](BoxPredictor_0/ClassPredictor/biases, save/RestoreV2:2)]]

Inspection showed that this error is due to attempting to assign tensors with different shapes from the variables restored from the checkpoint to the pipeline.config generated model. I visualized the graphs on Tensorboard and realized that the BoxPredictor_x/ClassPredictor have output tensors with different shape in checkpoint and the config generated model. I suppose some special config parameters were used.

I would appreciate if anyone can share their insights on the issue, or the config file.

Thanks and best,

yoyomolinas avatar Jan 17 '19 12:01 yoyomolinas

Solution

First of all, the conversion function posted above is incomplete; variables are not loaded with trained parameters. Here is the updated version of the conversion function to load trained params into variables.

def protobuf_to_checkpoint_conversion(pb_model, ckpt_dir):
  
    graph = tf.Graph()
    with graph.as_default():
        od_graph_def = tf.GraphDef()
        with tf.gfile.GFile(pb_model, 'rb') as fid:
            serialized_graph = fid.read()
            od_graph_def.ParseFromString(serialized_graph)
            tf.import_graph_def(od_graph_def,name='')

    image_tensor = graph.get_tensor_by_name('image_tensor:0')
    dummy = np.random.random((1, 512, 512, 3))
    
    with graph.as_default():
        config = tf.ConfigProto()
        with tf.Session(graph=graph, config=config) as sess:
            constant_ops = [op for op in graph.get_operations() if op.type == "Const"]
            vars_dict = {}
            ass = []
            for constant_op in constant_ops:
                name = constant_op.name
                const = constant_op.outputs[0]
                shape = const.shape
                var = tf.get_variable(name, shape, dtype=const.dtype, initializer=tf.zeros_initializer())
                vars_dict[name] = var

            print('INFO:Initializing variables')
            init = tf.global_variables_initializer()
            sess.run(init)

            print('INFO: Loading vars')
            for constant_op in tqdm(constant_ops):
                name = constant_op.name
                if 'FeatureExtractor' in name or 'BoxPredictor' in name:
                    const = constant_op.outputs[0]
                    shape = const.shape
                    var = vars_dict[name]
                    var.load(sess.run(const, feed_dict={image_tensor:dummy}), sess)
        
            saver = tf.train.Saver(var_list=vars_dict)
            ckpt_path = os.path.join(ckpt_dir, 'model.ckpt')
            saver.save(sess, ckpt_path)
    return graph, vars_dict

If variables are not loaded, randomly initialized variables will be restored.

Moreover, I solved the above issue by setting num_classes = 2 in pipeline.config file. For the object detection API this means that apart from the background class there are two more classes. This confuses me because the idea behind a binary object detector is that it has two classes, the object and the background class. Please provide some light into why num_classes is chosen to be 2 instead of 1.

I have the ckpt and config file now, reach out if you need it.

yoyomolinas avatar Jan 17 '19 20:01 yoyomolinas

I have the ckpt and config file now, reach out if you need it.

I need it very much,thank you!

hsulin0806 avatar Jan 28 '19 03:01 hsulin0806

@yoyomolinas Can you share the checkpoint and pipeline config file .. on google drive or dropbox.

Thanks

deimsdeutsch avatar Feb 13 '19 18:02 deimsdeutsch

Here is the config file for all the people who requested. @hsulin0806 , @deimsdeutsch. pipeline.config.zip

yoyomolinas avatar Feb 25 '19 08:02 yoyomolinas

EDIT:

I'll leave this here in case anyone encouters the same problem.

It was complaining about there not being a key named "global_step", so I manually inserted one

import os
import tensorflow as tf
import numpy as np
from tqdm import tqdm

def protobuf_to_checkpoint_conversion(pb_model, ckpt_dir):
  
    graph = tf.Graph()
    with graph.as_default():
        od_graph_def = tf.GraphDef()
        with tf.gfile.GFile(pb_model, 'rb') as fid:
            serialized_graph = fid.read()
            od_graph_def.ParseFromString(serialized_graph)
            tf.import_graph_def(od_graph_def,name='')

    image_tensor = graph.get_tensor_by_name('image_tensor:0')
    dummy = np.random.random((1, 512, 512, 3))
    
    with graph.as_default():
        config = tf.ConfigProto()
        with tf.Session(graph=graph, config=config) as sess:
            constant_ops = [op for op in graph.get_operations() if op.type == "Const"]
            vars_dict = {}
            ass = []

            for constant_op in constant_ops:
                name = constant_op.name
                const = constant_op.outputs[0]
                shape = const.shape
                var = tf.get_variable(name, shape, dtype=const.dtype, initializer=tf.zeros_initializer())
                vars_dict[name] = var
                pass
            
            # desperate times
            vars_dict["global_step"] = tf.get_variable(
                "global_step",
                shape=shape,
                dtype=tf.int64,
                initializer=tf.zeros_initializer()
            )

            print('INFO:Initializing variables')
            init = tf.global_variables_initializer()
            sess.run(init)
            # load_step = 0
            # global_step = tf.Variable(load_step, name="global_step", dtype=tf.int64)
            # sess.run(global_step.initializer)

            print('INFO: Loading vars')
            for constant_op in tqdm(constant_ops):
                name = constant_op.name
                if 'FeatureExtractor' in name or 'BoxPredictor' in name:
                    const = constant_op.outputs[0]
                    shape = const.shape
                    var = vars_dict[name]
                    var.load(sess.run(const, feed_dict={image_tensor:dummy}), sess)
        
            saver = tf.train.Saver(var_list=vars_dict)
            ckpt_path = os.path.join(ckpt_dir, 'model.ckpt')
            saver.save(sess, ckpt_path)
    return graph, vars_dict

This is just @yoyomolinas' code where I also insert a new item in the dictionary vars_dict


@yoyomolinas I can successfully generate the model.ckpt files using your code, however when using that checkpoint to run

export_tflite_ssd_graph.py --pipeline_config_path=pathto/pipeline.config --trained_checkpoint_prefix=pathto/model.ckpt --output_directory=pathto/outdir --add_postprocessing_op=true

it fails claiming

Key global_step not found in checkpoint

Is it something to do with how the .ckpt files are generated?

The purpose of this would be to use the generated .pb file to convert into a tflite model

Here is the complete error log:

2019-04-29 15:25:52.163393: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key global_step not found in checkpoint
Traceback (most recent call last):
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: Key global_step not found in checkpoint
         [[{{node save/RestoreV2}} = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1546, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key global_step not found in checkpoint
         [[node save/RestoreV2 (defined at /home/gustavoduartefaria/models/research/object_detection/export_tflite_ssd_graph_lib.py:285)  = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at:
  File "export_tflite_ssd_graph.py", line 143, in <module>
    tf.app.run(main)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "export_tflite_ssd_graph.py", line 139, in main
    FLAGS.max_classes_per_detection, FLAGS.use_regular_nms)
  File "/home/gustavoduartefaria/models/research/object_detection/export_tflite_ssd_graph_lib.py", line 285, in export_tflite_graph
    saver = tf.train.Saver(**saver_kwargs)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1102, in __init__
    self.build()
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1114, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1151, in _build
    build_save=build_save, build_restore=build_restore)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 795, in _build_internal
    restore_sequentially, reshape)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 406, in _AddRestoreOps
    restore_sequentially)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 862, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1466, in restore_v2
    shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

NotFoundError (see above for traceback): Key global_step not found in checkpoint
         [[node save/RestoreV2 (defined at /home/gustavoduartefaria/models/research/object_detection/export_tflite_ssd_graph_lib.py:285)  = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1556, in restore
    names_to_keys = object_graph_key_mapping(save_path)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1830, in object_graph_key_mapping
    checkpointable.OBJECT_GRAPH_PROTO_KEY)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 371, in get_tensor
    status)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "export_tflite_ssd_graph.py", line 143, in <module>
    tf.app.run(main)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "export_tflite_ssd_graph.py", line 139, in main
    FLAGS.max_classes_per_detection, FLAGS.use_regular_nms)
  File "/home/gustavoduartefaria/models/research/object_detection/export_tflite_ssd_graph_lib.py", line 299, in export_tflite_graph
    initializer_nodes='')
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/tools/freeze_graph.py", line 151, in freeze_graph_with_def_protos
    saver.restore(sess, input_checkpoint)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1562, in restore
    err, "a Variable name or other graph key that is missing")
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key global_step not found in checkpoint
         [[node save/RestoreV2 (defined at /home/gustavoduartefaria/models/research/object_detection/export_tflite_ssd_graph_lib.py:285)  = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at:
  File "export_tflite_ssd_graph.py", line 143, in <module>
    tf.app.run(main)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "export_tflite_ssd_graph.py", line 139, in main
    FLAGS.max_classes_per_detection, FLAGS.use_regular_nms)
  File "/home/gustavoduartefaria/models/research/object_detection/export_tflite_ssd_graph_lib.py", line 285, in export_tflite_graph
    saver = tf.train.Saver(**saver_kwargs)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1102, in __init__
    self.build()
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1114, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1151, in _build
    build_save=build_save, build_restore=build_restore)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 795, in _build_internal
    restore_sequentially, reshape)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 406, in _AddRestoreOps
    restore_sequentially)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 862, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1466, in restore_v2
    shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key global_step not found in checkpoint
         [[node save/RestoreV2 (defined at /home/gustavoduartefaria/models/research/object_detection/export_tflite_ssd_graph_lib.py:285)  = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Any help would be appreciated, thank you

fariagu avatar Apr 29 '19 15:04 fariagu

@fariagu I had the same issue too. What I did was that I went into the export.py and found the line that generates the error and commented out that line. Apparently tensorflow is trying to find and restore the global_step variable which does not exist in the checkpoint file generated.

Of course, this is a temporary solution. If you find some better way to do this, let us know. Also, do you know what the global step variable does in a checkpoint file?

yoyomolinas avatar May 02 '19 08:05 yoyomolinas

@yoyomolinas from what I could gather the global-step variable is a sort of counter for when generating checkpoint files

If you were to call

saver.save(sess, 'model.ckpt', global_step=0)

if would append '-0' to the file name, now becoming model.ckpt-0

I can't say my solution is better but the code I pasted above when I edited my comment instanciates that same global_step variable and inserts it into the vars_dict dictionary, while not passing it to the function saver.save() so the generated filenames remain the same.

Thanks for replying 😄

fariagu avatar May 02 '19 08:05 fariagu

@yoyomolinas As I read your comments you were trying to load this model in TensorRT. I'm trying the same thing right now. I've been able to generate .uff file but when I build the engine I get an error referring to the operation FILL which is not implemented in TensorRT engine.

[TRT] UffParser: Validator error: FeatureExtractor/MobilenetV1/zeros_6: Unsupported operation _Fill 

I'm thinking 2 possibilities: to remove those operations because I don't really see why they are there or implement the FILL operation as a customPlugin in the TensorRT engine.

Do you have any insight related to this?

sorny92 avatar May 30 '19 15:05 sorny92

@sorny92 First of all before converting graph to uff, tensorflow object detection api has an exporter tool that prepares detection graphs for deployment. This process involves removing some unnecessary ops such as ASSERT ops and possibly the FILL op you described above. Check the link I provide below for an example.

Converting models to uff have strict rules. For example, if one of the tf layers is not supported by the UffParser then you have to go about creating a custom plugin for TensorRT. Creating a custom layer is an arduous process. Instead I used Tensorflow's TensorRT package to optimize a tf graph in TensorRT. This package skips the TF layers not implemented in TensorRT during optimization. Although this solution is less optimal than using the converteduff model in TensorRT, I still achieved better performance than pure TF.

Click for examples on optimizing different models using tensorrt package of tensorflow. These examples also show how to properly export a detection model.

If you are going about implementing custom plugins in TensorRT let me know, we can collaborate.

yoyomolinas avatar Jun 01 '19 08:06 yoyomolinas

@yoyomolinas Oh yes, I tried that but it seems I compiled from sources my Tensorflow build with a different version of TensorRT. I will give it a try soon! If it doesn't work I might go for implementing the custom layer. This one doesn't seems to be that hard to implement as far I can see in the documentation it just fills a tensor with a value, I just don't get to see what's the point of this in inference so I might have to debug my graph too.

Thanks for your help, I will keep you informed if I get to implement it.

sorny92 avatar Jun 04 '19 07:06 sorny92

@yoyomolinas I used your code for checkpoint conversion. Its working pretty well but I am not able to use the exported frozen graph model to tensorrt uff model that is runnable on jetson-inference.

What might be the reason?

Varat7v2 avatar May 29 '20 13:05 Varat7v2