fashionpedia-api icon indicating copy to clipboard operation
fashionpedia-api copied to clipboard

Module error in New tensorflow/tpu inference code

Open gireek opened this issue 3 years ago • 16 comments

Hi, first of all great work with the new SpineNet 143 model with amazing accuracy. I am facing error while running the fashionpedia project of tensorflow/tpu on colab. This is my notebook.

When I do this inside the file inference.py

import sys
sys.path.insert(0, '/content/tpu/models/official/detection')

the previous error in my notebook goes but then the error comes

from hyperparameters import params_dict
ModuleNotFoundError: No module named 'hyperparameters'

How to import all files properly? please guide. Thanks @KMnP @richardaecn It would be great if you can time some time to release a colab on how to do inference for beginners.

gireek avatar Jul 30 '20 19:07 gireek

Hi @gireek. Looks like some of the directories are not included in your system path. The root dir should be tpu/models/official/detection/. I haven't finished all the code release. TODOs are:

  • [ ] Data conversion code (convert raw jpeg and annotations to TFRecord).
  • [ ] Test on my local machine.
  • [ ] Update README and tutorial.

I will find some time to finish during this weekend or next week.

But the usage should be almost same as https://github.com/tensorflow/tpu/blob/master/models/official/detection/GETTING_STARTED.md as I am following the same structure of the main codebase.

I tried inference.py on a remote server and it works well with SpineNet-143. Here are the results on the validation set [link]

richardaecn avatar Jul 30 '20 21:07 richardaecn

Hi @richardaecn

it will be very helpful to see a README and tutorial as you said you will upload shortly, will help us to resolve all current issues that we are facing. Your results looks really good!, would love to run and see it, your tutorial (once you upload) will help a lot in this regard.

Thanks!

mitraavi avatar Jul 30 '20 23:07 mitraavi

@richardaecn

Could you please tell me what is your env where you are running inference.py ?

I'm getting "key-error" when trying to run inference.py e.g...

"Key fpn/l3/bias not found in checkpoint, Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint"

I'm now trying in windows tensorflow 1.15.0 (CPU version). Have set all the paths as required.

Thanks

mitraavi avatar Jul 31 '20 20:07 mitraavi

@mitraavi

Looks like you didn't update the config. The default config in https://github.com/tensorflow/tpu/blob/master/models/official/detection/projects/fashionpedia/configs/model_config.py uses resnet50-FPN, so the model is looking for FPN weights. SpineNet doesn't have FPN.

Please pass the corresponding model config inhttps://github.com/tensorflow/tpu/tree/master/models/official/detection/projects/fashionpedia/configs/yaml to config_file flag in inference.py.

richardaecn avatar Jul 31 '20 21:07 richardaecn

@richardaecn,

Thanks for your reply, Yes, my config was wrong, after selecting "spinenet config", I'm not able to run inference in CPU. I ran both resnet-101 & spinenet-143, outputs are pretty impressive!

As you said looking forward on following examples from you:

  • Data conversion code (convert raw jpeg and annotations to TFRecord).
  • Test on my local machine.
  • Update README and tutorial.

Thanks! once again.

mitraavi avatar Aug 03 '20 16:08 mitraavi

@gireek here are steps I followed to run inference:

Download tpu-master from

https://github.com/tensorflow/tpu

Install coco-api:

pip install pycocotools-windows

Tensorflow 1.5.0 (cpu) install:

pip install tensorflow==1.5.0

Add following lines in inference.py:

from future import absolute_import from future import division from future import print_function

after this add ... import sys sys.path.insert(0, "PATH TO /tpu-master/models/official/detection") sys.path.insert(1, "PATH TO /tpu-master/models/official/efficientnet")

Add PYTHONPATH in windows env variable:

PATH TO /tpu-master/models or export PYTHONPATH="$PYTHONPATH:/path/to/models" (for Ubuntu)

mitraavi avatar Aug 04 '20 02:08 mitraavi

@richardaecn as you said earlier if you kindly share following examples :

  • Data conversion code (convert raw jpeg and annotations to TFRecord).

  • Test on my local machine.

  • Update README and tutorial.

I will be very helpful for us to see as how we prepare and train Attribute Mask RCNN network.

Thank You

mitraavi avatar Aug 14 '20 05:08 mitraavi

Hey, can anyone please guide me in running inference.py, I am trying to implement the API on Google colab, But I am facing few errors while running the inference.py!

I would really be thankful, If anyone of you , can help me implement it! thanks!

amrahsmaytas avatar Sep 01 '20 17:09 amrahsmaytas

Hi, first of all great work with the new SpineNet 143 model with amazing accuracy. I am facing error while running the fashionpedia project of tensorflow/tpu on colab. This is my notebook.

When I do this inside the file inference.py

import sys
sys.path.insert(0, '/content/tpu/models/official/detection')

the previous error in my notebook goes but then the error comes

from hyperparameters import params_dict
ModuleNotFoundError: No module named 'hyperparameters'

How to import all files properly? please guide. Thanks @KMnP @richardaecn It would be great if you can time some time to release a colab on how to do inference for beginners.

I am struck at same place, did you resolve it ? please, guide me through,if you are able to solve it!

amrahsmaytas avatar Sep 01 '20 17:09 amrahsmaytas

@gireek i did not run in colab, but i could run it in laptop without error and steps to run i already shared earlier. please check my comments above.

mitraavi avatar Sep 06 '20 03:09 mitraavi

@richardaecn @KMnP

This is how I called the inference: !python ./inference.py \ --checkpoint_path="./projects/fashionpedia/fashionpedia-spinenet-143/model.ckpt"\ --label_map_file="./projects/fashionpedia/dataset/fashionpedia_label_map.csv"\ --image_file_pattern="./nord1.jpeg" \ --output_html="./test.html" \ --max_boxes_to_draw=10 \ --min_score_threshold=0.05

and this is the error: Please guide

/opt/anaconda3/lib/python3.7/site-packages/absl/flags/_validators.py:359: UserWarning: Flag --model has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
  'command line!' % flag_name)
/opt/anaconda3/lib/python3.7/site-packages/absl/flags/_validators.py:359: UserWarning: Flag --checkpoint_path has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
  'command line!' % flag_name)
/opt/anaconda3/lib/python3.7/site-packages/absl/flags/_validators.py:359: UserWarning: Flag --label_map_file has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
  'command line!' % flag_name)
/opt/anaconda3/lib/python3.7/site-packages/absl/flags/_validators.py:359: UserWarning: Flag --image_file_pattern has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
  'command line!' % flag_name)
/opt/anaconda3/lib/python3.7/site-packages/absl/flags/_validators.py:359: UserWarning: Flag --output_html has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
  'command line!' % flag_name)
 - Loading the label map...
WARNING:tensorflow:From /home/jupyter/beauty_demo/tpu/models/official/detection/modeling/architecture/nn_ops.py:387: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.keras.layers.Conv2D` instead.
W0929 14:08:35.918150 140065166116608 deprecation.py:323] From /home/jupyter/beauty_demo/tpu/models/official/detection/modeling/architecture/nn_ops.py:387: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.keras.layers.Conv2D` instead.
WARNING:tensorflow:From /opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/layers/convolutional.py:424: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
W0929 14:08:35.919402 140065166116608 deprecation.py:323] From /opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/layers/convolutional.py:424: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
WARNING:tensorflow:From /home/jupyter/beauty_demo/tpu/models/official/detection/modeling/architecture/nn_ops.py:184: batch_normalization (from tensorflow.python.layers.normalization) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.BatchNormalization instead.  In particular, `tf.control_dependencies(tf.GraphKeys.UPDATE_OPS)` should not be used (consult the `tf.keras.layers.batch_normalization` documentation).
W0929 14:08:35.928565 140065166116608 deprecation.py:323] From /home/jupyter/beauty_demo/tpu/models/official/detection/modeling/architecture/nn_ops.py:184: batch_normalization (from tensorflow.python.layers.normalization) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.BatchNormalization instead.  In particular, `tf.control_dependencies(tf.GraphKeys.UPDATE_OPS)` should not be used (consult the `tf.keras.layers.batch_normalization` documentation).
WARNING:tensorflow:From /home/jupyter/beauty_demo/tpu/models/official/detection/modeling/architecture/resnet.py:232: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.MaxPooling2D instead.
W0929 14:08:35.942683 140065166116608 deprecation.py:323] From /home/jupyter/beauty_demo/tpu/models/official/detection/modeling/architecture/resnet.py:232: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.MaxPooling2D instead.
I0929 14:08:35.944797 140065166116608 nn_blocks.py:137] -----> Building bottleneck block.
I0929 14:08:36.031078 140065166116608 nn_blocks.py:137] -----> Building bottleneck block.
I0929 14:08:36.094928 140065166116608 nn_blocks.py:137] -----> Building bottleneck block.
I0929 14:08:36.159844 140065166116608 nn_blocks.py:137] -----> Building bottleneck block.
I0929 14:08:36.249165 140065166116608 nn_blocks.py:137] -----> Building bottleneck block.
I0929 14:08:36.315292 140065166116608 nn_blocks.py:137] -----> Building bottleneck block.
I0929 14:08:36.382416 140065166116608 nn_blocks.py:137] -----> Building bottleneck block.
I0929 14:08:36.448992 140065166116608 nn_blocks.py:137] -----> Building bottleneck block.
I0929 14:08:36.544984 140065166116608 nn_blocks.py:137] -----> Building bottleneck block.
I0929 14:08:36.616196 140065166116608 nn_blocks.py:137] -----> Building bottleneck block.
I0929 14:08:36.685314 140065166116608 nn_blocks.py:137] -----> Building bottleneck block.
I0929 14:08:36.755772 140065166116608 nn_blocks.py:137] -----> Building bottleneck block.
I0929 14:08:36.826793 140065166116608 nn_blocks.py:137] -----> Building bottleneck block.
I0929 14:08:36.898900 140065166116608 nn_blocks.py:137] -----> Building bottleneck block.
I0929 14:08:36.996672 140065166116608 nn_blocks.py:137] -----> Building bottleneck block.
I0929 14:08:37.066566 140065166116608 nn_blocks.py:137] -----> Building bottleneck block.
INFO:tensorflow:Computing number of FLOPs before NMS...
I0929 14:08:38.143349 140065166116608 retinanet_model.py:88] Computing number of FLOPs before NMS...
I0929 14:08:38.147276 140065166116608 benchmark_utils.py:33] number of trainable params: 34.091159 M.
WARNING:tensorflow:From /opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/profiler/internal/flops_registry.py:142: tensor_shape_from_node_def_name (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.tensor_shape_from_node_def_name`
W0929 14:08:38.148303 140065166116608 deprecation.py:323] From /opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/profiler/internal/flops_registry.py:142: tensor_shape_from_node_def_name (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.tensor_shape_from_node_def_name`
3 ops no flops stats due to incomplete shapes.
Parsing Inputs...
I0929 14:08:38.734824 140065166116608 benchmark_utils.py:41] number of FLOPS (multi-adds) per image: 96.936275 B.
WARNING:tensorflow:From /home/jupyter/beauty_demo/tpu/models/official/detection/ops/postprocess_ops.py:175: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0929 14:08:39.035172 140065166116608 deprecation.py:323] From /home/jupyter/beauty_demo/tpu/models/official/detection/ops/postprocess_ops.py:175: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
2020-09-29 14:08:41.001581: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-09-29 14:08:41.696779: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-29 14:08:41.697372: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
2020-09-29 14:08:41.697633: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-09-29 14:08:41.698928: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-09-29 14:08:41.700052: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-09-29 14:08:41.700340: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-09-29 14:08:41.701792: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-09-29 14:08:41.702870: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-09-29 14:08:41.706676: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-09-29 14:08:41.706808: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-29 14:08:41.707442: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-29 14:08:41.707999: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-09-29 14:08:41.708382: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-09-29 14:08:41.716636: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200000000 Hz
2020-09-29 14:08:41.717633: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5616194327b0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-09-29 14:08:41.717659: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-09-29 14:08:41.815369: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-29 14:08:41.816144: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5616194756b0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-09-29 14:08:41.816180: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla T4, Compute Capability 7.5
2020-09-29 14:08:41.816382: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-29 14:08:41.817008: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
2020-09-29 14:08:41.817078: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-09-29 14:08:41.817098: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-09-29 14:08:41.817114: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-09-29 14:08:41.817130: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-09-29 14:08:41.817145: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-09-29 14:08:41.817161: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-09-29 14:08:41.817177: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-09-29 14:08:41.817237: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-29 14:08:41.817867: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-29 14:08:41.818448: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-09-29 14:08:41.818534: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-09-29 14:08:41.819750: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-29 14:08:41.819771: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2020-09-29 14:08:41.819779: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2020-09-29 14:08:41.819908: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-29 14:08:41.820556: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-29 14:08:41.821172: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14221 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
 - Loading the checkpoint...
INFO:tensorflow:Restoring parameters from ./projects/fashionpedia/fashionpedia-spinenet-143/model.ckpt
I0929 14:08:41.823671 140065166116608 saver.py:1284] Restoring parameters from ./projects/fashionpedia/fashionpedia-spinenet-143/model.ckpt
2020-09-29 14:08:42.830169: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key fpn/l3/bias not found in checkpoint
Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: Key fpn/l3/bias not found in checkpoint
	 [[{{node save/RestoreV2}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 1290, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key fpn/l3/bias not found in checkpoint
	 [[node save/RestoreV2 (defined at /opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Original stack trace for 'save/RestoreV2':
  File "./inference.py", line 225, in <module>
    tf.app.run(main)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/opt/anaconda3/lib/python3.7/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/opt/anaconda3/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "./inference.py", line 151, in main
    saver = tf.train.Saver()
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 828, in __init__
    self.build()
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 840, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 878, in _build
    build_restore=build_restore)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 508, in _build_internal
    restore_sequentially, reshape)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 328, in _AddRestoreOps
    restore_sequentially)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_io_ops.py", line 1696, in restore_v2
    name=name)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 1300, in restore
    names_to_keys = object_graph_key_mapping(save_path)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 1618, in object_graph_key_mapping
    object_graph_string = reader.get_tensor(trackable.OBJECT_GRAPH_PROTO_KEY)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/pywrap_tensorflow_internal.py", line 915, in get_tensor
    return CheckpointReader_GetTensor(self, compat.as_bytes(tensor_str))
tensorflow.python.framework.errors_impl.NotFoundError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./inference.py", line 225, in <module>
    tf.app.run(main)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/opt/anaconda3/lib/python3.7/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/opt/anaconda3/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "./inference.py", line 156, in main
    saver.restore(sess, FLAGS.checkpoint_path)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 1306, in restore
    err, "a Variable name or other graph key that is missing")
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key fpn/l3/bias not found in checkpoint
	 [[node save/RestoreV2 (defined at /opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Original stack trace for 'save/RestoreV2':
  File "./inference.py", line 225, in <module>
    tf.app.run(main)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/opt/anaconda3/lib/python3.7/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/opt/anaconda3/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "./inference.py", line 151, in main
    saver = tf.train.Saver()
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 828, in __init__
    self.build()
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 840, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 878, in _build
    build_restore=build_restore)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 508, in _build_internal
    restore_sequentially, reshape)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 328, in _AddRestoreOps
    restore_sequentially)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_io_ops.py", line 1696, in restore_v2
    name=name)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

gireek avatar Sep 29 '20 12:09 gireek

And when I do inference with model type attribute_mask_rcnn like the following

!python ./inference.py \
  --model="attribute_mask_rcnn" \
  --checkpoint_path="./projects/fashionpedia/fashionpedia-spinenet-143/model.ckpt"\
  --label_map_file="./projects/fashionpedia/dataset/fashionpedia_label_map.csv"\
  --image_file_pattern="./nord1.jpeg" \
  --output_html="./test.html" \
  --max_boxes_to_draw=10 \
  --min_score_threshold=0.05

I get this error:

/opt/anaconda3/lib/python3.7/site-packages/absl/flags/_validators.py:359: UserWarning: Flag --model has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
  'command line!' % flag_name)
/opt/anaconda3/lib/python3.7/site-packages/absl/flags/_validators.py:359: UserWarning: Flag --checkpoint_path has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
  'command line!' % flag_name)
/opt/anaconda3/lib/python3.7/site-packages/absl/flags/_validators.py:359: UserWarning: Flag --label_map_file has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
  'command line!' % flag_name)
/opt/anaconda3/lib/python3.7/site-packages/absl/flags/_validators.py:359: UserWarning: Flag --image_file_pattern has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
  'command line!' % flag_name)
/opt/anaconda3/lib/python3.7/site-packages/absl/flags/_validators.py:359: UserWarning: Flag --output_html has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
  'command line!' % flag_name)
 - Loading the label map...
Traceback (most recent call last):
  File "inference.py", line 225, in <module>
    tf.app.run(main)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/opt/anaconda3/lib/python3.7/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/opt/anaconda3/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "inference.py", line 106, in main
    params = config_factory.config_generator(FLAGS.model)
  File "/home/jupyter/beauty_demo/tpu/models/official/detection/configs/factory.py", line 43, in config_generator
    raise ValueError('Model %s is not supported.' % model)
ValueError: Model attribute_mask_rcnn is not supported.

gireek avatar Sep 29 '20 12:09 gireek

@mitraavi @KMnP @richardaecn request you to verify the inference call in last 2 comments and suggest what changes should be made. Parameter restoring from model did not fail for you so please suggest.

gireek avatar Sep 29 '20 14:09 gireek

@KMnP @richardaecn @mitraavi @amrahsmaytas Please provide any guidance on this issue.

gireek avatar Oct 06 '20 09:10 gireek

Hi, all if anyone managed to run the inference code, could they provide a guideline on how to run the code and what env we need to have?

hosnasattar avatar Dec 02 '20 17:12 hosnasattar

And when I do inference with model type attribute_mask_rcnn like the following

!python ./inference.py \
  --model="attribute_mask_rcnn" \
  --checkpoint_path="./projects/fashionpedia/fashionpedia-spinenet-143/model.ckpt"\
  --label_map_file="./projects/fashionpedia/dataset/fashionpedia_label_map.csv"\
  --image_file_pattern="./nord1.jpeg" \
  --output_html="./test.html" \
  --max_boxes_to_draw=10 \
  --min_score_threshold=0.05

I get this error:

/opt/anaconda3/lib/python3.7/site-packages/absl/flags/_validators.py:359: UserWarning: Flag --model has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
  'command line!' % flag_name)
/opt/anaconda3/lib/python3.7/site-packages/absl/flags/_validators.py:359: UserWarning: Flag --checkpoint_path has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
  'command line!' % flag_name)
/opt/anaconda3/lib/python3.7/site-packages/absl/flags/_validators.py:359: UserWarning: Flag --label_map_file has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
  'command line!' % flag_name)
/opt/anaconda3/lib/python3.7/site-packages/absl/flags/_validators.py:359: UserWarning: Flag --image_file_pattern has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
  'command line!' % flag_name)
/opt/anaconda3/lib/python3.7/site-packages/absl/flags/_validators.py:359: UserWarning: Flag --output_html has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
  'command line!' % flag_name)
 - Loading the label map...
Traceback (most recent call last):
  File "inference.py", line 225, in <module>
    tf.app.run(main)
  File "/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/opt/anaconda3/lib/python3.7/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/opt/anaconda3/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "inference.py", line 106, in main
    params = config_factory.config_generator(FLAGS.model)
  File "/home/jupyter/beauty_demo/tpu/models/official/detection/configs/factory.py", line 43, in config_generator
    raise ValueError('Model %s is not supported.' % model)
ValueError: Model attribute_mask_rcnn is not supported.

Use : --model="mask_rcnn" instead of --model="attribute_mask_rcnn" and also add the config file path which you have used in training . --config_file=""

It should look like :

!python ./inference.py
--model="mask_rcnn"
--checkpoint_path="./projects/fashionpedia/fashionpedia-spinenet-143/model.ckpt"
--config_file=""
--label_map_file="./projects/fashionpedia/dataset/fashionpedia_label_map.csv"
--image_file_pattern="./nord1.jpeg"
--output_html="./test.html"
--max_boxes_to_draw=10
--min_score_threshold=0.05

Gagan54 avatar Jan 27 '21 06:01 Gagan54