facenet-pytorch icon indicating copy to clipboard operation
facenet-pytorch copied to clipboard

GenerateBoundingBox

Open FrancescoSaverioZuppichini opened this issue 3 years ago • 3 comments

Hi Tim,

Is it possibe to know how this method works (https://github.com/timesler/facenet-pytorch/blob/dd0b0e4b5b124b599f75b87e570910e5d80c8848/models/utils/detect_face.py#L203). I cannot understand how it can handle batches of images and the overall logic, there is no comment.

Thank you!

Cheers,

Francesco

Hi @FrancescoSaverioZuppichini - I am also facing a similar issue. Have you figured this out? I want to generate a bounding box for detected faces using the MTCNN model. I am only able to get the coordinates of the bounding box using the mtcnn.detect() function. Please let me know if you can help. Thanks, Megh

meghbhalerao avatar Mar 27 '21 08:03 meghbhalerao

Nope mate :)

This question also bothers me, after some digging and some help from YIYAN llm.
I think these comments may help other people. `## Fist let us say we put 2 images as a batch

def generateBoundingBox(reg, probs, scale, thresh): """summary

Args:
    reg (_type_): [b,4,h,w] b mean numbers of images h men height after conv ops in pnet,w is the same story
    probs (_type_): [b,h,w] same as above 
    scale (_type_): a number like 0.6
    thresh (_type_): a number like 0.7

Returns:
    boundingbox(_type_): [n,9] n mean number of positive area , this is used for building the bbox later
    image_inds(_type_): [n]  this is to tell witch of the pic that area (boundingbox) belones to.
"""
stride = 2
cellsize = 12

reg = reg.permute(1, 0, 2, 3) # shape changed to [4,b,h,w]

mask = probs >= thresh  # mask shape is same as probs and full of True / False in it
mask_inds = mask.nonzero() # [n,3] let's say we got 9 posizition got True in our mask ,the we shall have [9,3]here like coordinates
image_inds = mask_inds[:, 0] # [n]  lets say i got 2 pics first pic got 4 true in first pic ,output will looks like :[0,0,0,0,1,1,1,1,1]
score = probs[mask] #[n] flatten the output 
reg = reg[:, mask].permute(1, 0) # reg[:, mask] keep the positive reg flatten like score [4,n],and change shape to [n,4]
#[n,2] this one is like image_inds, inside of tell every bbox's image index ,
# this one tell u the coordinate in the picture after pnet  the flip(1) change the shape from [n,h,w] to [n,w,h]
bb = mask_inds[:, 1:].type(reg.dtype).flip(1) 
q1 = ((stride * bb + 1) / scale).floor() # caculate the uper-left corner coord of every area
q2 = ((stride * bb + cellsize - 1 + 1) / scale).floor()# caculate the right-down corner coord of every area
boundingbox = torch.cat([q1, q2, score.unsqueeze(1), reg], dim=1) # concate the area coords and refine reg
return boundingbox, image_inds`

bf96163 avatar Dec 05 '23 07:12 bf96163