keras-io
keras-io copied to clipboard
Retinanet tutorial - can not run on custom data
I've been reading the retinaNet tutorial and running it, it works like a charm. It was introduced in #109 by @srihari-humbarwadi, awesome work! .
However, I can not underestand, and didn't find any reference on how to prepare a custom dataset for this purpose.
I want to train it on my own model with a custom dataset, clean-dirty-containers-in-montevideo, but I can't find how to convert it from images with xml annotations to the expected format in this tutorial.
I find this repo with some scripts to convert xml annotations to tf records, but not sure if the output is in the required format. I tried it but it didn't work.
I think there should be at least a link to an article or another tutorial explaining how to do it, in the docs.
Any help (sample code, or tutorial on this step) is appreciated.
Can we get a follow-up here? I'm also interested to know how we can load our own dataset in this example. Thanks
To make sure that the example doesn't have a lot of code around processing and preparing the data, we uses tensorflow-datasets to load mscoco dataset. If you wish to train it on your own dataset, you need to convert your data into a format that this function expects https://github.com/keras-team/keras-io/blob/9dbe6aef1082e6ef320021ef710c016b712a1379/examples/vision/retinanet.py#L374 along with this you may also need to make necessary changes to the tf.data pipeline.
you need to convert your data into a format that this function expects
I think this is exactly what needs to be covered. Maybe this tutorial is not the best place, to avoid a lot of code as you say, but may be a complementary article could be written, and include a pointer to it.
In my case, I couldn't pre process my data :/
Exactly! If there was something that could help process the data and have it close to what the function expects. I have given some tries but no success :(
It follows the format returned by the tensorflow_datasets builders, You can try matching it by doing something like ---
{
"image": <image tensor>,
"objects": {
"bbox": [
[y1, x1, y2, x2],
[y1, x1, y2, x2],
[y1, x1, y2, x2],
[y1, x1, y2, x2],
[y1, x1, y2, x2],
],
"label": [
class_id_for_object_1,
class_id_for_object_2,
class_id_for_object_3,
class_id_for_object_4,
class_id_for_object_5
]
}
}
I tried to make a dict like this:
import pandas as pd
file_location = '/content/drive/MyDrive/Skripsi/Labels/label_train.csv'
column_names = ["filename", "width", "height", "class_label", "xmin", "ymin", "xmax", "ymax"]
df = pd.read_csv(file_location, names=column_names)
df = df.drop([0])
filenames = df.filename.to_list()
classes = df.class_label.to_list()
xmin = df.xmin.to_list()
ymin = df.ymin.to_list()
xmax = df.xmax.to_list()
ymax = df.ymax.to_list()
image = []
bbox = []
id = []
# for i in range(len(filenames)):
for i in range(5):
## Encode Image
# image.append(open(os.path.join(data, filenames[i]), 'rb').read())
image.append(filenames[i])
##Bbox append
for j in range(4):
tmp = []
tmp.append(float(xmin[i]))
tmp.append(float(ymin[i]))
tmp.append(float(xmax[i]))
tmp.append(float(ymax[i]))
bbox.append(tmp)
print(bbox[i])
## class prep
if classes[i] == "Debris":
id.append(2)
else:
id.append(1)
objects = {"bbox":bbox, "id":id}
train_dataset = {"image":image, "objects": objects}
so the returned dict is:
{
'image': ['00708.jpg', '01289.jpg', '01441.jpg', '00327.jpg', '01460.jpg'],
'objects':
{
'bbox': [[277.0, 427.0, 416.0, 480.0],
[266.0, 1.0, 347.0, 61.0],
[385.0, 249.0, 451.0, 320.0],
[89.0, 431.0, 144.0, 462.0],
[433.0, 274.0, 457.0, 341.0]],
'id': [2, 2, 1, 1, 1]
}
}
is it the right approach?
I have an approach that works. Its a bit of a hack but it works just fine.
- Create three basic lists, one for the filename, one for bounding box, and one for the encoded label.
- Cast each of these as tf.constant
- Create your dataset like so:
train_dataset = tf.data.Dataset.from_tensor_slices({"filename":filename, "objects": {"label":label, "bbox":bbox]}}
- Read the image from disk in the preprocess_data function, where it is processed as part of the pipeline. something lie so:
def preprocess_data(sample):
image_string = tf.io.read_file(sample["filename"])
image = tf.image.decode_jpeg(image_string)
bbox = swap_xy(sample["objects"]["bbox"])
class_id = tf.cast(sample["objects"]["label"], dtype=tf.int32)
I don't want to flip horizontally.
image, bbox = random_flip_horizontal(image, bbox)
image, image_shape, _ = resize_and_pad_image(image)
bbox = tf.stack(
[
bbox[:, 0] * image_shape[1],
bbox[:, 1] * image_shape[0],
bbox[:, 2] * image_shape[1],
bbox[:, 3] * image_shape[0],
],
axis=-1,
)
bbox = convert_to_xywh(bbox)
return image, bbox, class_id
I have an approach that works. Its a bit of a hack but it works just fine.
- Create three basic lists, one for the filename, one for bounding box, and one for the encoded label.
- Cast each of these as tf.constant
- Create your dataset like so:
train_dataset = tf.data.Dataset.from_tensor_slices({"filename":filename, "objects": {"label":label, "bbox":bbox]}}
- Read the image from disk in the preprocess_data function, where it is processed as part of the pipeline. something lie so:
def preprocess_data(sample): image_string = tf.io.read_file(sample["filename"]) image = tf.image.decode_jpeg(image_string) bbox = swap_xy(sample["objects"]["bbox"]) class_id = tf.cast(sample["objects"]["label"], dtype=tf.int32) I don't want to flip horizontally. image, bbox = random_flip_horizontal(image, bbox) image, image_shape, _ = resize_and_pad_image(image) bbox = tf.stack( [ bbox[:, 0] * image_shape[1], bbox[:, 1] * image_shape[0], bbox[:, 2] * image_shape[1], bbox[:, 3] * image_shape[0], ], axis=-1, ) bbox = convert_to_xywh(bbox) return image, bbox, class_id
I tried to apply the changes here but then got this error:
ValueError: Index out of range using input dim 1; input has only 1 dims for '{{node while/strided_slice_13}} = StridedSlice[Index=DT_INT32, T=DT_FLOAT, begin_mask=3, ellipsis_mask=0, end_mask=1, new_axis_mask=0, shrink_axis_mask=0](while/concat_7, while/strided_slice_13/stack, while/strided_slice_13/stack_1, while/strided_slice_13/stack_2)' with input shapes: [4], [2], [2], [2] and with computed input tensors: input[3] = <1 1>.
Seems like my list of bounding boxes are not suitable since it is a list of 1-D vector and can't be indexed using the syntax for matrices, e.g. using [:, 0] as in some functions here.
@rroosshhaann Do you mind sharing your code of 1-3rd steps? Thank you
@lolikgiovi
Did you manage to solve the problem? Do you already have a version of this code for custom-datasets?
I solved the custom data input problem!
if you follow the following link you find how to create a Tensorflow Dataset https://www.tensorflow.org/datasets/add_dataset Create this dataset first with your data. You need to have a csv file with all annotations for this, which you can simply map from your xml files.
If you created this dataset then you now should have a mydataset.py file. There you need to adapt your FeaturesDict as follows { "image": tfds.features.Image(shape=(None, None, 3)), "objects": tfds.features.Sequence( { "bbox": tfds.features.BBoxFeature(), "label": tfds.features.ClassLabel(num_classes=1), }),}
Since i didn't need all features, here I only included the basic ones needed for successful training. Next the yield part needs to be adapted. I did it like this: { "image": images_path / f"{image_id}.jpg", "objects": [ { "bbox": tfds.features.BBox( int(row["ymin"]) / int(row["height"]), int(row["xmin"]) / int(row["width"]), int(row["ymax"]) / int(row["height"]), int(row["xmax"]) / int(row["width"]), ), "label": row['label'], }, ], }
if you then build that with in the command line using $tfds build you should get the correctly built dataset as a folder in your tensorflow datasets directory. Now you can simply change the dataset name in the load function and everything should work just fine.
I hope this helps. It was quite difficult to achieve...
I have an approach that works. Its a bit of a hack but it works just fine.
- Create three basic lists, one for the filename, one for bounding box, and one for the encoded label.
- Cast each of these as tf.constant
- Create your dataset like so:
train_dataset = tf.data.Dataset.from_tensor_slices({"filename":filename, "objects": {"label":label, "bbox":bbox]}}
- Read the image from disk in the preprocess_data function, where it is processed as part of the pipeline. something lie so:
def preprocess_data(sample): image_string = tf.io.read_file(sample["filename"]) image = tf.image.decode_jpeg(image_string) bbox = swap_xy(sample["objects"]["bbox"]) class_id = tf.cast(sample["objects"]["label"], dtype=tf.int32) I don't want to flip horizontally. image, bbox = random_flip_horizontal(image, bbox) image, image_shape, _ = resize_and_pad_image(image) bbox = tf.stack( [ bbox[:, 0] * image_shape[1], bbox[:, 1] * image_shape[0], bbox[:, 2] * image_shape[1], bbox[:, 3] * image_shape[0], ], axis=-1, ) bbox = convert_to_xywh(bbox) return image, bbox, class_id
@rroosshhaann This works when there's only one bbox per image, right? I get the error message " Can't convert non-rectangular Python sequence to Tensor." when using this approach when having multiple bboxes per image.
#> It follows the format returned by the tensorflow_datasets builders, You can try matching it by doing something like ---
{ "image": <image tensor>, "objects": { "bbox": [ [y1, x1, y2, x2], [y1, x1, y2, x2], [y1, x1, y2, x2], [y1, x1, y2, x2], [y1, x1, y2, x2], ], "label": [ class_id_for_object_1, class_id_for_object_2, class_id_for_object_3, class_id_for_object_4, class_id_for_object_5 ] } }
I did. After the preprocess_data() , batch(2) and padded_batch functions, my data is like :
(<tf.Tensor: shape=(2, 896, 896, 3), dtype=float32, numpy= array([[[[0., 0., 0.], [0., 0., 0.], [0., 0., 0.], ..., [0., 0., 0.], [0., 0., 0.], [0., 0., 0.]]]], dtype=float32)>, <tf.Tensor: shape=(2, 4, 4), dtype=float32, numpy= array([[[216.70493 , 518.21747 , 91.102036, 139.47879 ], [450.6239 , 193.12451 , 128.74966 , 128.74968 ], [450.6239 , 193.12451 , 128.74966 , 128.74968 ], [450.6239 , 193.12451 , 128.74966 , 128.74968 ]],
[[576.2461 , 246.96262 , 164.64172 , 164.64174 ], [576.2461 , 246.96262 , 164.64172 , 164.64174 ], [620.71826 , 486.96158 , 224.25342 , 194.15302 ], [399.30353 , 495.5043 , 116.384705, 102.51276 ]]], dtype=float32)>,
<tf.Tensor: shape=(2, 4), dtype=int32, numpy= array([[2, 2, 2, 2], [1, 1, 1, 1]])>)
But it wont work on this step :
train_dataset= train_dataset.map(
label_encoder.encode_batch, num_parallel_calls=autotune
)
from
autotune = tf.data.experimental.AUTOTUNE
train_dataset = train_dataset.map(preprocess_data, num_parallel_calls=autotune)
train_dataset = train_dataset.shuffle( batch_size)
train_dataset = train_dataset.padded_batch(
batch_size=batch_size, padding_values=(0.0, 1e-8, -1), drop_remainder=True
)
train_dataset = train_dataset.map(
label_encoder.encode_batch, num_parallel_calls=autotune
)
even changing num_classes for something like 10 or less.
The error:
<ipython-input-26-92d21a062ba6>:250 encode_batch *
label = self._encode_sample(images_shape, gt_boxes[i], cls_ids[i])
<ipython-input-26-92d21a062ba6>:233 _encode_sample *
box_target = self._compute_box_target(anchor_boxes, matched_gt_boxes)
<ipython-input-26-92d21a062ba6>:215 _compute_box_target *
box_target = tf.concat(
D:\PYTHON\SII\env\lib\site-packages\tensorflow\python\framework\ops.py:870 __array__ **
" a NumPy call, which is not supported".format(self.name))
NotImplementedError: Cannot convert a symbolic Tensor (while/truediv_16:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported
The intersting part is, the label_encoder.encode_batch
works just as expected on the data. But when u put it on training_data.map( label_encoder.encode_batch) it wont run.
Thankyou
It just worked. TF 2.5.0
@rpsantosa @X-F-Lpro could you guys provide a little bit more detail about implementing the RetinaNet on custom dataset
I could try to give you some hints on how to proceed if you can provide me with what you are trying to accomplish. As I worked on a software I may not publish I can only give code examples. Is there a certain aspect, that is not working for you or which you do not understand?
I have a csv file which has an annotation format as follows: "path to image","xmin","ymin","xmax","ymax","class ID","class name" and I am trying to load the data in the same format as the reference Coco dataset. I tried to do the same as 05vald0 but I got the same error as @lolikgiovi . I saw your solution in the above comment but I couldn't understand the process of creating dataset.
@X-F-Lpro I am detecting object detection for small target, so I will change the anchor box size accordingly as well.