Machine-Learning-Collection
Machine-Learning-Collection copied to clipboard
Inconsistent box format in YOLOv3 train.py
I'm just curious what the reasoning behind using the box format of corners in get_evaluation_bboxes and then using midpoint in mean_average_precision. My model has been able to achieve a mAP of 0.951 for my training data and 0.761 for my validation data so it should be working right? Well when I test it on an image from the data set it gets the classification correct every time but the box is still slightly off as you can see in this image.

YOLO works by classifying a specific bounding box so it makes no sense that it would be classifying that bounding box as a pedestrian cross walk with 93% certanty while there isn't even a sign present in that box. The only possible solution to this is that when I get my bounding boxes I'm somehow messing up my scales but I just have no clue where I'm doing that at. After going back through the code I realized what I mentioned at the beginning of the post. So some clarification on this would be appreciated.
To try to give yall an idea of how I set my data up in my dataset class I am calculating the bounding box using midpoint so it looks like:
bboxes = []
for index, values in group.iterrows():
lab = self.labs.index(values["Annotation tag"])
image = Image.open(self.path+values["Filename"])
width, height = image.size
x1, x2, y1, y2 = values["Upper left corner X"] / width, values["Lower right corner X"] / width, values["Upper left corner Y"] / height, values["Lower right corner Y"] / height
w, h = x2 - x1, y2-y1
x, y = (x2 + x1) / 2, (y2 + y1) / 2
# x, y, width, height, class_label
bboxes.append([x,y,w,h,lab])
Then the rest of the code is basically the same thing that is in the origional dataset class. Now when I go to calculate mAP my code looks like this:
pred_boxes, true_boxes = get_evaluation_bboxes(
train_loader,
model,
iou_threshold=config.NMS_IOU_THRESH,
anchors=config.ANCHORS,
threshold=config.CONF_THRESHOLD,
)
maptrain = mean_average_precision(
pred_boxes,
true_boxes,
iou_threshold=config.MAP_IOU_THRESH,
box_format="midpoint",
num_classes=config.NUM_CLASSES,
)
This is where I said that there is the inconsistency because get_evaluation_bboxes defaults to corners while mean_average_precision is set to use midpoint.
Finally the code that I used to get the bounding box for the image above is:
scaled_anchors = (
torch.tensor(C.ANCHORS)
* torch.tensor(C.S).unsqueeze(1).unsqueeze(1).repeat(1, 3, 2)
).to(C.DEVICE)
image = image.to("cuda")
with torch.no_grad():
model.eval()
tmp = torch.reshape(image,(1,image.size()[0],image.size()[1],image.size()[2]))
out = model(tmp)
bboxes = [[] for _ in range(image.shape[0])]
for i in range(3):
batch_size, A, S, _, _ = out[i].size()
anchor = scaled_anchors[i]
boxes_scale_i = cells_to_bboxes(
out[i], anchor, S=S, is_preds=True
)
for idx, (box) in enumerate(boxes_scale_i):
bboxes[idx] += box
for i in range(batch_size):
nms_boxes = non_max_suppression(
bboxes[i], iou_threshold=C.NMS_IOU_THRESH, threshold=C.CONF_THRESHOLD
)
image = Image.open(image_path)
#image = image.resize((416, 416))
bboxes = nms_boxes
width, height = image.size
width, height
ids = pd.read_csv("lib/datasets/ids.csv")
for count, box in enumerate(bboxes):
color = (100+count*int(155/4), 0, 100+count*int(155/4))
name = ids[ids["ID"] == int(box[0])]["Name"].tolist()[0]
x = box[2]
y = box[3]
w = box[4]
h = box[5]
left, right, top, bottom = [(x - w / 2) * width, (x + w / 2) * width,
(y - h / 2) * height, (y + h / 2) * height]
left, right, top, bottom = int(left), int(right), int(top), int(bottom)
draw = ImageDraw.Draw(image)
draw.line([(left, top), (left, bottom), (right, bottom),(right, top), (left, top)], width=2, fill=color)
font = ImageFont.truetype("arial.ttf", 30)
#draw.text((int(left+5),int(top-30)), name, fill=color, font=font)
image
In here I converted the predictions to bounding boxes using midpoint. I also tried the way that was written in plot_couple_examples and I got the same result.
I think the implementation was explained very well and based on the mAP scores the model works good as well but it means nothing if I can't get it to make an accurate bounding box prediction.