LCFCN
LCFCN copied to clipboard
How does the loss function work and how is it actually implemented in Torch ?
As per suggestion, I've opened a new issue so that others might also benefit from this.
Okay, after reading up on "https://arxiv.org/abs/1506.02106" and watershed transforms I kinda understand the losses.
How exactly are the blobs computed ? (How are they stored and made use of for calculating split level loss)
Blobs are computed in two steps (as shown in line 33-46 in models.py):
- First we compute
pred_mask = self(images).data.max(1)[1].squeeze().cpu().numpy(), which gets the argmax over the probability of each pixel over theKchannels in theK x H x Wmask whereKis the number of categories, and H and W are the height and width.
As a result this gives you a binary matrix for each category. Note that the background is also a category.
- Then we apply the connect components algorithm (skimage) to get the blobs for each binary matrix,
connected_components = morph.label(pred_mask==category_id)
connected_components has each blob labeled with a different unique id. The number of unique ids is the number of blobs.
What does it mean when blobs[ None ] is returned ? I've never seen this anywhere before. (Also counts[None])
[None] just adds a dimension, if blobs is K x H x W, then blobs[None] is 1 x K x H x W
I went through all the code in model.py and I don't understand much. Particularly in class FCN8 I understand the computation till fc7 but I get lost in the semantic segmentation part.
`
scores = self.scoring_layer( fc7 )
upscore2 = self.upscore2(scores)
# second
score_pool4 = self.score_pool4(pool4)
score_pool4c = score_pool4[:, :, 5:5+upscore2.size(2),
5:5+upscore2.size(3)]
upscore_pool4 = self.upscore_pool4(score_pool4c + upscore2)
# third
score_pool3 = self.score_pool3(pool3)
score_pool3c = score_pool3[:, :, 9:9+upscore_pool4.size(2),
9:9+upscore_pool4.size(3)]
output = self.upscore8(score_pool3c + upscore_pool4)
return output[:, :, 31: (31 + h), 31: (31 + w)].contiguous()
`
What exactly is happening here ? Also, how many outputs does the model have ? It's supposed to output the blobs and it also is supposed output the locations of the detected objects right ?
I'm getting confused as to what the output of the model is and how exactly the blobs are handled (Apologies if I'm getting back to the same thing again, I'm having a touch time wrapping my head around this).
The segmentation part you showed is the upsampling path which combines different features from VGG16 to output a K x H x W matrix where K is the number of classes and H and W are the image width and height. This procedure is described more fully in the first deep-based segmentation paper as FCN8: https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf
At this point, there are no blobs, just activations for each pixel for each class. Hope this helps.
No, I mean more specifically, what does
score_pool4c = score_pool4[:, :, 5:5+upscore2.size(2), 5:5+upscore2.size(3)]
Why 5, 9 and 31 ?
And how are blobs computed from this again ?
- The 5, 9, and 31 are there to take care of the shifting due to the max-pooling layers of VGG16.
- Once you get the
outputabove (which isK x H x W) , you apply the following two functions to get the blobs,
# Get the labels across the channels 'K' which is the number of classes
pred_mask = output.argmax(1).squeeze().cpu().numpy()
# Get the blobs for category 'k' as follows:
blobs_k = morph.label(pred_mask==k)
Hey, I'd been looking at your code for a while now and zeroed in on the parts I don't understand (I have other parts I don't understand either but it requires me to understand the former first).
-
In get_blob_dict in losses.py `
blob_uniques, blob_counts = np.unique(class_blobs * (points_mask), return_counts=True) <--- THIS uniques = np.delete(np.unique(class_blobs), blob_uniques) <--- THIS for u in uniques: #iterate over falsely predicted blobs blobList += [{"class":l, "label":u, "n_points":0, "size":0, "pointsList":[]}] n_fp += 1 for i, u in enumerate(blob_uniques): if u == 0: continue pointsList = [] blob_ind = class_blobs==u locs = np.where(blob_ind * (points_mask)) <--- THIS for j in range(locs[0].shape[0]): pointsList += [{"y":locs[0][j], "x":locs[1][j]}]`i) I'm guessing that blob_uniques now contains points that are inside the predicted blobs ? ii) I don't understand why np.delete(). I'm guessing it's to ignore the points that are inside correct blobs ? I read up on the doc for np.delete() and I don't think it's doing what it's supposed to be doing. iii) I also don't understand what np.where() is supposed to be doing ? Again the docs don't corroborate what's supposed to happening there.
-
In compute_image_loss in losses.py
ones = torch.ones(Counts.size(0), 1).long().cuda()
BgFgCounts = torch.cat([ones, Counts], 1)
Target = (BgFgCounts.view(n*k).view(-1) > 0).view(-1).float()
Smax = S.view(n, k, h*w).max(2)[0].view(-1)
I don't understand how you get target values because when I tried to simulate that piece of code (with respect to trancos dataset, only two classes, background and foreground), ones and BgFgCounts don't have the same number of dimensions. Also is there any reason you flatten Target twice ?
- In compute_fp_loss() in losses.py
T = np.ones(blobs.shape[-2:]) #FLAG
T[blobs[b["class"]] == b["label"]
I'm not entirely sure what's happening here either.
I understand a lot of these questions might be very trivial 😅 but you have no idea how much your help is appreciated. Thanks again 😃.
-
i.
blob_uniquescorresponds to all the unique blobs that intersect with the point-annotations. The false positive blobs are those that do not intersect with the point-annotations, which is whatuniques = np.delete(np.unique(class_blobs), blob_uniques)gives us.ii.
numpy.delete(arr, obj, axis=None)deletes the values inobjfromarr. Like you said, it's to ignore the points that are inside correct blobs.iii.
np.where()returns the x- and y- coordinates for the points. In this case,locs = np.where(blob_ind * (points_mask))returns the locations of the point-annotations that intersect with the predicted blobs. -
i. Trancos only has two classes.
BgFgCountsconsists of the background label (which is 1 for all datasets) and whether other objects exist. Since Trancos images always have a car in them,BgFgCounts=[1,1]for all images.ii. There is no reason for flattening
Targettwice, you are right. -
For the
compute_fp_loss,bis the binary matrix corresponding to the blob with no points ; i.e.b["n_points"] == 0which is a false positive blob. Therefore,T[blobs[b["class"]] == b["label"]] = 0gets the indices where the false positive blob is and sets it to zero. ThenF.nll_loss(S_log, torch.LongTensor(T).cuda()[None],...)sets those blobs to background using the cross entropy loss.
Hope this helps!
PS: In case it's helpful, you can put a breaking point as import ipdb; ipdb.set_trace() at any line and observe how the arrays are behaving across program.
Thanks a lot! I didn't know about ipdb, I'll make use of it.
Hey, Just to clarify, In models.py, ResFCN, You're first reducing the size using interpolate and then in the end you're increasing the size in the last interpolate? Cause I'm guessing logits_16s_spatial_dim will be smaller than that of 32s and 8s spatial dimensions would we smaller than that of 16s ?

logits_32sis the original image size divided by 32.logits_16sis the original image size divided by 16.logits_8sis the original image size divided by 8. Sologits_8sis the largest one.
The interpolate code you showed above is the upsampling path of the network.
logits_32gets resized to the size oflogits_16, and then added tologits_16;logits_16gets resized to the size oflogits_8, and then added tologits_8; and finallylogits_8gets resized to the size of the original image.
Ah, yes, I got it. Got a bit confused with the variable names themselves cause I initialized them wrong. Thanks a lot!
I've ran into a few more questions that unfortunately ipdb couldn't answer (Thanks a lot for that btw, it was very helpful).
-
isn't false positive loss basically a part of the point level loss when it is calculated the first time around ? Meaning you are penalizing the network twice as much for false positives isn't it ?
-
Also I don't quite get the places where you've used
blobs[b['class']] == b['label']Okay I mostly understand this as you're getting the blobs of a particular class and then working with them. Took me a while to figure this out. So in trancos b['class'] will always be 0 right? Since there is only one other class besides background ? -
I don't have a formal education in digital image processing so this one might be a bit trivial. What does black_tophat in watersplit() do exactly (with respect to the probabilities) ? I read up on it and it seems to be a way to contrast important objects but I see you're using it with the probabilities and I'm not sure what to make of it.
So what I understand from split_level_loss is that you're getting the boundaries between objects (inside the blob) and then setting them to background (0) and then performing nll_loss against the output of the network.
Please correct me if I am wrong.