keras-io icon indicating copy to clipboard operation
keras-io copied to clipboard

Retinanet tutorial - can not run on custom data

Open rola93 opened this issue 3 years ago • 19 comments

I've been reading the retinaNet tutorial and running it, it works like a charm. It was introduced in #109 by @srihari-humbarwadi, awesome work! .

However, I can not underestand, and didn't find any reference on how to prepare a custom dataset for this purpose.

I want to train it on my own model with a custom dataset, clean-dirty-containers-in-montevideo, but I can't find how to convert it from images with xml annotations to the expected format in this tutorial.

I find this repo with some scripts to convert xml annotations to tf records, but not sure if the output is in the required format. I tried it but it didn't work.

I think there should be at least a link to an article or another tutorial explaining how to do it, in the docs.

Any help (sample code, or tutorial on this step) is appreciated.

rola93 avatar Mar 12 '21 19:03 rola93

Can we get a follow-up here? I'm also interested to know how we can load our own dataset in this example. Thanks

barbaragabriella avatar Apr 30 '21 14:04 barbaragabriella

To make sure that the example doesn't have a lot of code around processing and preparing the data, we uses tensorflow-datasets to load mscoco dataset. If you wish to train it on your own dataset, you need to convert your data into a format that this function expects https://github.com/keras-team/keras-io/blob/9dbe6aef1082e6ef320021ef710c016b712a1379/examples/vision/retinanet.py#L374 along with this you may also need to make necessary changes to the tf.data pipeline.

srihari-humbarwadi avatar Apr 30 '21 15:04 srihari-humbarwadi

you need to convert your data into a format that this function expects

I think this is exactly what needs to be covered. Maybe this tutorial is not the best place, to avoid a lot of code as you say, but may be a complementary article could be written, and include a pointer to it.

In my case, I couldn't pre process my data :/

rola93 avatar May 04 '21 14:05 rola93

Exactly! If there was something that could help process the data and have it close to what the function expects. I have given some tries but no success :(

barbaragabriella avatar May 04 '21 14:05 barbaragabriella

It follows the format returned by the tensorflow_datasets builders, You can try matching it by doing something like ---

{
	"image": <image tensor>,
	"objects": {
		"bbox": [
				[y1, x1, y2, x2],
				[y1, x1, y2, x2],
				[y1, x1, y2, x2],
				[y1, x1, y2, x2],
				[y1, x1, y2, x2],
		],
		"label": [
				class_id_for_object_1,
				class_id_for_object_2,
				class_id_for_object_3,
				class_id_for_object_4,
				class_id_for_object_5
		]
	}
} 

srihari-humbarwadi avatar May 05 '21 06:05 srihari-humbarwadi

I tried to make a dict like this:

import pandas as pd

file_location = '/content/drive/MyDrive/Skripsi/Labels/label_train.csv'
column_names = ["filename", "width", "height", "class_label", "xmin", "ymin", "xmax", "ymax"]
df = pd.read_csv(file_location, names=column_names)
df = df.drop([0])

filenames = df.filename.to_list()
classes = df.class_label.to_list()
xmin = df.xmin.to_list()
ymin = df.ymin.to_list()
xmax = df.xmax.to_list()
ymax = df.ymax.to_list()

image = []
bbox = []
id = []

# for i in range(len(filenames)):
for i in range(5):
  ## Encode Image
  # image.append(open(os.path.join(data, filenames[i]), 'rb').read())
  image.append(filenames[i])

  ##Bbox append
  for j in range(4):
    tmp = []
    tmp.append(float(xmin[i]))
    tmp.append(float(ymin[i]))
    tmp.append(float(xmax[i]))
    tmp.append(float(ymax[i]))
  bbox.append(tmp)
  print(bbox[i])

  ## class prep
  if classes[i] == "Debris":
    id.append(2)
  else:
    id.append(1)

objects = {"bbox":bbox, "id":id}
train_dataset = {"image":image, "objects": objects}

so the returned dict is:

{
    'image': ['00708.jpg', '01289.jpg', '01441.jpg', '00327.jpg', '01460.jpg'],
    'objects': 
          {
              'bbox': [[277.0, 427.0, 416.0, 480.0],
                      [266.0, 1.0, 347.0, 61.0],
                      [385.0, 249.0, 451.0, 320.0],
                      [89.0, 431.0, 144.0, 462.0],
                      [433.0, 274.0, 457.0, 341.0]],
              'id': [2, 2, 1, 1, 1]
           }
 }

is it the right approach?

lolikgiovi avatar May 05 '21 09:05 lolikgiovi

I have an approach that works. Its a bit of a hack but it works just fine.

  1. Create three basic lists, one for the filename, one for bounding box, and one for the encoded label.
  2. Cast each of these as tf.constant
  3. Create your dataset like so: train_dataset = tf.data.Dataset.from_tensor_slices({"filename":filename, "objects": {"label":label, "bbox":bbox]}}
  4. Read the image from disk in the preprocess_data function, where it is processed as part of the pipeline. something lie so:
def preprocess_data(sample):  
    image_string = tf.io.read_file(sample["filename"])
    image = tf.image.decode_jpeg(image_string)
    
    bbox = swap_xy(sample["objects"]["bbox"])
    class_id = tf.cast(sample["objects"]["label"], dtype=tf.int32)
    
    I don't want to flip horizontally. 
    image, bbox = random_flip_horizontal(image, bbox)
    image, image_shape, _ = resize_and_pad_image(image)

    bbox = tf.stack(
        [
            bbox[:, 0] * image_shape[1],
            bbox[:, 1] * image_shape[0],
            bbox[:, 2] * image_shape[1],
            bbox[:, 3] * image_shape[0],
        ],
        axis=-1,
    )
    bbox = convert_to_xywh(bbox)
    return image, bbox, class_id

rroosshhaann avatar May 28 '21 07:05 rroosshhaann

I have an approach that works. Its a bit of a hack but it works just fine.

  1. Create three basic lists, one for the filename, one for bounding box, and one for the encoded label.
  2. Cast each of these as tf.constant
  3. Create your dataset like so: train_dataset = tf.data.Dataset.from_tensor_slices({"filename":filename, "objects": {"label":label, "bbox":bbox]}}
  4. Read the image from disk in the preprocess_data function, where it is processed as part of the pipeline. something lie so:
def preprocess_data(sample):  
    image_string = tf.io.read_file(sample["filename"])
    image = tf.image.decode_jpeg(image_string)
    
    bbox = swap_xy(sample["objects"]["bbox"])
    class_id = tf.cast(sample["objects"]["label"], dtype=tf.int32)
    
    I don't want to flip horizontally. 
    image, bbox = random_flip_horizontal(image, bbox)
    image, image_shape, _ = resize_and_pad_image(image)

    bbox = tf.stack(
        [
            bbox[:, 0] * image_shape[1],
            bbox[:, 1] * image_shape[0],
            bbox[:, 2] * image_shape[1],
            bbox[:, 3] * image_shape[0],
        ],
        axis=-1,
    )
    bbox = convert_to_xywh(bbox)
    return image, bbox, class_id

I tried to apply the changes here but then got this error:

ValueError: Index out of range using input dim 1; input has only 1 dims for '{{node while/strided_slice_13}} = StridedSlice[Index=DT_INT32, T=DT_FLOAT, begin_mask=3, ellipsis_mask=0, end_mask=1, new_axis_mask=0, shrink_axis_mask=0](while/concat_7, while/strided_slice_13/stack, while/strided_slice_13/stack_1, while/strided_slice_13/stack_2)' with input shapes: [4], [2], [2], [2] and with computed input tensors: input[3] = <1 1>.

Seems like my list of bounding boxes are not suitable since it is a list of 1-D vector and can't be indexed using the syntax for matrices, e.g. using [:, 0] as in some functions here.

@rroosshhaann Do you mind sharing your code of 1-3rd steps? Thank you

lolikgiovi avatar Jun 10 '21 05:06 lolikgiovi

@lolikgiovi

Did you manage to solve the problem? Do you already have a version of this code for custom-datasets?

AndreAbade avatar Jun 23 '21 02:06 AndreAbade

I solved the custom data input problem!

if you follow the following link you find how to create a Tensorflow Dataset https://www.tensorflow.org/datasets/add_dataset Create this dataset first with your data. You need to have a csv file with all annotations for this, which you can simply map from your xml files.

If you created this dataset then you now should have a mydataset.py file. There you need to adapt your FeaturesDict as follows { "image": tfds.features.Image(shape=(None, None, 3)), "objects": tfds.features.Sequence( { "bbox": tfds.features.BBoxFeature(), "label": tfds.features.ClassLabel(num_classes=1), }),}

Since i didn't need all features, here I only included the basic ones needed for successful training. Next the yield part needs to be adapted. I did it like this: { "image": images_path / f"{image_id}.jpg", "objects": [ { "bbox": tfds.features.BBox( int(row["ymin"]) / int(row["height"]), int(row["xmin"]) / int(row["width"]), int(row["ymax"]) / int(row["height"]), int(row["xmax"]) / int(row["width"]), ), "label": row['label'], }, ], }

if you then build that with in the command line using $tfds build you should get the correctly built dataset as a folder in your tensorflow datasets directory. Now you can simply change the dataset name in the load function and everything should work just fine.

I hope this helps. It was quite difficult to achieve...

X-F-Lpro avatar Jun 25 '21 08:06 X-F-Lpro

I have an approach that works. Its a bit of a hack but it works just fine.

  1. Create three basic lists, one for the filename, one for bounding box, and one for the encoded label.
  2. Cast each of these as tf.constant
  3. Create your dataset like so: train_dataset = tf.data.Dataset.from_tensor_slices({"filename":filename, "objects": {"label":label, "bbox":bbox]}}
  4. Read the image from disk in the preprocess_data function, where it is processed as part of the pipeline. something lie so:
def preprocess_data(sample):  
    image_string = tf.io.read_file(sample["filename"])
    image = tf.image.decode_jpeg(image_string)
    
    bbox = swap_xy(sample["objects"]["bbox"])
    class_id = tf.cast(sample["objects"]["label"], dtype=tf.int32)
    
    I don't want to flip horizontally. 
    image, bbox = random_flip_horizontal(image, bbox)
    image, image_shape, _ = resize_and_pad_image(image)

    bbox = tf.stack(
        [
            bbox[:, 0] * image_shape[1],
            bbox[:, 1] * image_shape[0],
            bbox[:, 2] * image_shape[1],
            bbox[:, 3] * image_shape[0],
        ],
        axis=-1,
    )
    bbox = convert_to_xywh(bbox)
    return image, bbox, class_id

@rroosshhaann This works when there's only one bbox per image, right? I get the error message " Can't convert non-rectangular Python sequence to Tensor." when using this approach when having multiple bboxes per image.

05vald0 avatar Jul 15 '21 11:07 05vald0

#> It follows the format returned by the tensorflow_datasets builders, You can try matching it by doing something like ---

{
	"image": <image tensor>,
	"objects": {
		"bbox": [
				[y1, x1, y2, x2],
				[y1, x1, y2, x2],
				[y1, x1, y2, x2],
				[y1, x1, y2, x2],
				[y1, x1, y2, x2],
		],
		"label": [
				class_id_for_object_1,
				class_id_for_object_2,
				class_id_for_object_3,
				class_id_for_object_4,
				class_id_for_object_5
		]
	}
} 

I did. After the preprocess_data() , batch(2) and padded_batch functions, my data is like :

(<tf.Tensor: shape=(2, 896, 896, 3), dtype=float32, numpy= array([[[[0., 0., 0.], [0., 0., 0.], [0., 0., 0.], ..., [0., 0., 0.], [0., 0., 0.], [0., 0., 0.]]]], dtype=float32)>, <tf.Tensor: shape=(2, 4, 4), dtype=float32, numpy= array([[[216.70493 , 518.21747 , 91.102036, 139.47879 ], [450.6239 , 193.12451 , 128.74966 , 128.74968 ], [450.6239 , 193.12451 , 128.74966 , 128.74968 ], [450.6239 , 193.12451 , 128.74966 , 128.74968 ]],

    [[576.2461  , 246.96262 , 164.64172 , 164.64174 ],
     [576.2461  , 246.96262 , 164.64172 , 164.64174 ],
     [620.71826 , 486.96158 , 224.25342 , 194.15302 ],
     [399.30353 , 495.5043  , 116.384705, 102.51276 ]]], dtype=float32)>,

<tf.Tensor: shape=(2, 4), dtype=int32, numpy= array([[2, 2, 2, 2], [1, 1, 1, 1]])>)

But it wont work on this step :

 train_dataset= train_dataset.map(
     label_encoder.encode_batch, num_parallel_calls=autotune
 )

from

autotune = tf.data.experimental.AUTOTUNE
train_dataset = train_dataset.map(preprocess_data, num_parallel_calls=autotune)
train_dataset = train_dataset.shuffle( batch_size)
train_dataset = train_dataset.padded_batch(
   batch_size=batch_size, padding_values=(0.0, 1e-8, -1), drop_remainder=True
 )
 train_dataset = train_dataset.map(
   label_encoder.encode_batch, num_parallel_calls=autotune
)

even changing num_classes for something like 10 or less.

The error:

<ipython-input-26-92d21a062ba6>:250 encode_batch  *
        label = self._encode_sample(images_shape, gt_boxes[i], cls_ids[i])
    <ipython-input-26-92d21a062ba6>:233 _encode_sample  *
        box_target = self._compute_box_target(anchor_boxes, matched_gt_boxes)
    <ipython-input-26-92d21a062ba6>:215 _compute_box_target  *
        box_target = tf.concat(
    D:\PYTHON\SII\env\lib\site-packages\tensorflow\python\framework\ops.py:870 __array__  **
        " a NumPy call, which is not supported".format(self.name))

NotImplementedError: Cannot convert a symbolic Tensor (while/truediv_16:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported

The intersting part is, the label_encoder.encode_batch works just as expected on the data. But when u put it on training_data.map( label_encoder.encode_batch) it wont run.

Thankyou

rpsantosa avatar Aug 01 '21 21:08 rpsantosa

It just worked. TF 2.5.0

rpsantosa avatar Aug 04 '21 17:08 rpsantosa

@rpsantosa @X-F-Lpro could you guys provide a little bit more detail about implementing the RetinaNet on custom dataset

nikeshdevkota avatar Oct 03 '22 09:10 nikeshdevkota

I could try to give you some hints on how to proceed if you can provide me with what you are trying to accomplish. As I worked on a software I may not publish I can only give code examples. Is there a certain aspect, that is not working for you or which you do not understand?

X-F-Lpro avatar Oct 03 '22 15:10 X-F-Lpro

Github I have a csv file which has an annotation format as follows: "path to image","xmin","ymin","xmax","ymax","class ID","class name" and I am trying to load the data in the same format as the reference Coco dataset. I tried to do the same as 05vald0 but I got the same error as @lolikgiovi . I saw your solution in the above comment but I couldn't understand the process of creating dataset.

nikeshdevkota avatar Oct 04 '22 06:10 nikeshdevkota

@X-F-Lpro I am detecting object detection for small target, so I will change the anchor box size accordingly as well.

nikeshdevkota avatar Oct 04 '22 06:10 nikeshdevkota