armory icon indicating copy to clipboard operation
armory copied to clipboard

WIP: Mscoco tf2

Open davidslater opened this issue 2 years ago • 7 comments

Fixes #1330

This update should fix it from our side.

However, I now run into the following error in the ART import:

File "/workspace/armory/baseline_models/tf_graph/mscoco_frcnn.py", line 20, in get_art_model
model = TensorFlowFasterRCNN(
        └ <class 'art.estimators.object_detection.tensorflow_faster_rcnn.TensorFlowFasterRCNN'>
File "/opt/conda/lib/python3.8/site-packages/art/estimators/object_detection/tensorflow_faster_rcnn.py", line 161, in __init__
self._model, self._predictions, self._losses, self._detections = self._load_model(
│    │       │                  │             │                  │    └ <staticmethod object at 0x7fd5a6a40e80>
│    │       │                  │             │                  └ TensorFlowFasterRCNN(sess=None, channels_first=False, model=None, clip_values=[0. 1.], preprocessing=StandardisationMeanStd(m...
│    │       │                  │             └ TensorFlowFasterRCNN(sess=None, channels_first=False, model=None, clip_values=[0. 1.], preprocessing=StandardisationMeanStd(m...
│    │       │                  └ TensorFlowFasterRCNN(sess=None, channels_first=False, model=None, clip_values=[0. 1.], preprocessing=StandardisationMeanStd(m...
│    │       └ TensorFlowFasterRCNN(sess=None, channels_first=False, model=None, clip_values=[0. 1.], preprocessing=StandardisationMeanStd(m...
│    └ None
└ TensorFlowFasterRCNN(sess=None, channels_first=False, model=None, clip_values=[0. 1.], preprocessing=StandardisationMeanStd(m...
File "/opt/conda/lib/python3.8/site-packages/art/estimators/object_detection/tensorflow_faster_rcnn.py", line 263, in _load_model
from object_detection.builders import model_builder
File "/opt/conda/lib/python3.8/site-packages/object_detection/builders/model_builder.py", line 55, in <module>
from object_detection.models import center_net_resnet_v1_fpn_feature_extractor
File "/opt/conda/lib/python3.8/site-packages/object_detection/models/center_net_resnet_v1_fpn_feature_extractor.py", line 24, in <module>
from object_detection.models.keras_models import resnet_v1
File "/opt/conda/lib/python3.8/site-packages/object_detection/models/keras_models/resnet_v1.py", line 24, in <module>
from tensorflow.python.keras.applications import resnet
ModuleNotFoundError: No module named 'tensorflow.python.keras.applications'

The error is in the object_detection codebase, so not something we can easily fix. Maybe this requires moving to resnet_v2 ?

davidslater avatar Mar 18 '22 18:03 davidslater

I'm trying to instantiate the model with the following kwargs instead:

filename="faster_rcnn_inception_resnet_v2_640x640_coco17_tpu-8",
url="http://download.tensorflow.org/models/object_detection/tf2/20200711/faster_rcnn_inception_resnet_v2_640x640_coco17_tpu-8.tar.gz",

This yields the following error in the TensorFlowFasterRCNN's _load_model():

    model = TensorFlowFasterRCNN(
  File "/opt/conda/lib/python3.8/site-packages/art/estimators/object_detection/tensorflow_faster_rcnn.py", line 161, in __init__
    self._model, self._predictions, self._losses, self._detections = self._load_model(
  File "/opt/conda/lib/python3.8/site-packages/art/estimators/object_detection/tensorflow_faster_rcnn.py", line 316, in _load_model
    vars_in_ckpt = variables_helper.get_variables_available_in_checkpoint(
  File "/opt/conda/lib/python3.8/site-packages/object_detection/utils/variables_helper.py", line 152, in get_variables_available_in_checkpoint
    ckpt_reader = tf.train.NewCheckpointReader(checkpoint_path)
  File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/training/py_checkpoint_reader.py", line 96, in NewCheckpointReader
    error_translator(e)
  File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/training/py_checkpoint_reader.py", line 31, in error_translator
    raise errors_impl.NotFoundError(None, None, error_message)
tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /tmp/.art/data/faster_rcnn_inception_resnet_v2_640x640_coco17_tpu-8/model.ckpt

Looking inside this directory:

I have no name!@371060d2031a:/workspace$ ls /tmp/.art/data/faster_rcnn_inception_resnet_v2_640x640_coco17_tpu-8/
checkpoint  pipeline.config  saved_model

The checkpoint is downloaded and exists, but with a different path than ART expects here. It looks like this might require a modification to ART, so I've notified Beat

A note on my environment: I'm inside a TF2 container, but I modified the Dockerfile to git checkout a more recent commit of tensorflow/models/research

lcadalzo avatar Mar 30 '22 17:03 lcadalzo

It shouldn't be downloading to /tmp/.art/data/ if ART_DATA_PATH is set in armory/init.py. Are you importing armory before calling it?

davidslater avatar Mar 30 '22 18:03 davidslater

No, I was running the minimum code needed to reproduce the error:

from art.estimators.object_detection.tensorflow_faster_rcnn import TensorFlowFasterRCNN
import tensorflow.compat.v1 as tf

tf.disable_eager_execution()

images = tf.placeholder(tf.float32, shape=(1, None, None, 3))

model = TensorFlowFasterRCNN(
    images,
    model=None,
    filename="faster_rcnn_inception_resnet_v2_640x640_coco17_tpu-8",
    url="http://download.tensorflow.org/models/object_detection/tf2/20200711/faster_rcnn_inception_resnet_v2_640x640_coco17_tpu-8.tar.gz",
)

However, I just tried adding an import armory at the top of the file, and I get the same /tmp/.art/data path in the error message. I also notice this log:

2022-03-30 20:27:14  0s INFO     art.config:set_data_path:55 set ART_DATA_PATH to /tmp/.art/data

Edit: Ok I see now that the change you're referring to exists on develop but not this branch mscoco-tf2 where I'm running from, so that explains why. Do you expect this to have any effect on the error? At a glance, I would assume that downloading to a different path would still result in the same error (just with a different path) since it's unrelated to the assumptions the ART code is making about how to restore a TF checkpoint

lcadalzo avatar Mar 30 '22 20:03 lcadalzo

Yeah, reading more fully, I think you're right - you'll still get the same error.

davidslater avatar Mar 30 '22 20:03 davidslater

Yeah, I think using it should point to the subdir checkpoint instead of an individual file.

Also, it looks like this method of loading is deprecated in TF2: https://www.tensorflow.org/api_docs/python/tf/compat/v1/train/init_from_checkpoint

davidslater avatar Mar 30 '22 20:03 davidslater

Added ART issue 1616

lcadalzo avatar Apr 01 '22 16:04 lcadalzo

Should be in ART 1.11

davidslater avatar Jun 01 '22 22:06 davidslater