facenet-pytorch
facenet-pytorch copied to clipboard
GenerateBoundingBox
Hi Tim,
Is it possibe to know how this method works (https://github.com/timesler/facenet-pytorch/blob/dd0b0e4b5b124b599f75b87e570910e5d80c8848/models/utils/detect_face.py#L203). I cannot understand how it can handle batches of images and the overall logic, there is no comment.
Thank you!
Cheers,
Francesco
Hi @FrancescoSaverioZuppichini - I am also facing a similar issue. Have you figured this out? I want to generate a bounding box for detected faces using the MTCNN
model. I am only able to get the coordinates of the bounding box using the mtcnn.detect()
function.
Please let me know if you can help.
Thanks,
Megh
Nope mate :)
This question also bothers me, after some digging and some help from YIYAN llm.
I think these comments may help other people.
`## Fist let us say we put 2 images as a batch
def generateBoundingBox(reg, probs, scale, thresh): """summary
Args:
reg (_type_): [b,4,h,w] b mean numbers of images h men height after conv ops in pnet,w is the same story
probs (_type_): [b,h,w] same as above
scale (_type_): a number like 0.6
thresh (_type_): a number like 0.7
Returns:
boundingbox(_type_): [n,9] n mean number of positive area , this is used for building the bbox later
image_inds(_type_): [n] this is to tell witch of the pic that area (boundingbox) belones to.
"""
stride = 2
cellsize = 12
reg = reg.permute(1, 0, 2, 3) # shape changed to [4,b,h,w]
mask = probs >= thresh # mask shape is same as probs and full of True / False in it
mask_inds = mask.nonzero() # [n,3] let's say we got 9 posizition got True in our mask ,the we shall have [9,3]here like coordinates
image_inds = mask_inds[:, 0] # [n] lets say i got 2 pics first pic got 4 true in first pic ,output will looks like :[0,0,0,0,1,1,1,1,1]
score = probs[mask] #[n] flatten the output
reg = reg[:, mask].permute(1, 0) # reg[:, mask] keep the positive reg flatten like score [4,n],and change shape to [n,4]
#[n,2] this one is like image_inds, inside of tell every bbox's image index ,
# this one tell u the coordinate in the picture after pnet the flip(1) change the shape from [n,h,w] to [n,w,h]
bb = mask_inds[:, 1:].type(reg.dtype).flip(1)
q1 = ((stride * bb + 1) / scale).floor() # caculate the uper-left corner coord of every area
q2 = ((stride * bb + cellsize - 1 + 1) / scale).floor()# caculate the right-down corner coord of every area
boundingbox = torch.cat([q1, q2, score.unsqueeze(1), reg], dim=1) # concate the area coords and refine reg
return boundingbox, image_inds`