LCFCN How does the loss function work and how is it actually implemented in Torch ?

As per suggestion, I've opened a new issue so that others might also benefit from this.

Jan 21 '19 14:01 gauthsvenkat

Okay, after reading up on "https://arxiv.org/abs/1506.02106" and watershed transforms I kinda understand the losses.

How exactly are the blobs computed ? (How are they stored and made use of for calculating split level loss)

Jan 22 '19 15:01 gauthsvenkat

Blobs are computed in two steps (as shown in line 33-46 in models.py):

First we computepred_mask = self(images).data.max(1)[1].squeeze().cpu().numpy(), which gets the argmax over the probability of each pixel over the K channels in the K x H x W mask where K is the number of categories, and H and W are the height and width.

As a result this gives you a binary matrix for each category. Note that the background is also a category.

Then we apply the connect components algorithm (skimage) to get the blobs for each binary matrix, connected_components = morph.label(pred_mask==category_id)

connected_components has each blob labeled with a different unique id. The number of unique ids is the number of blobs.

Jan 23 '19 14:01 IssamLaradji

What does it mean when blobs[ None ] is returned ? I've never seen this anywhere before. (Also counts[None])

Jan 23 '19 14:01 gauthsvenkat

[None] just adds a dimension, if blobs is K x H x W, then blobs[None] is 1 x K x H x W

Jan 23 '19 15:01 IssamLaradji

I went through all the code in model.py and I don't understand much. Particularly in class FCN8 I understand the computation till fc7 but I get lost in the semantic segmentation part.

`
scores = self.scoring_layer( fc7 ) upscore2 = self.upscore2(scores)

    # second
    score_pool4 = self.score_pool4(pool4)
    score_pool4c = score_pool4[:, :, 5:5+upscore2.size(2), 
                                     5:5+upscore2.size(3)]
    upscore_pool4 = self.upscore_pool4(score_pool4c + upscore2)

    # third
    score_pool3 = self.score_pool3(pool3)
    score_pool3c = score_pool3[:, :, 9:9+upscore_pool4.size(2), 
                                     9:9+upscore_pool4.size(3)]

    output = self.upscore8(score_pool3c + upscore_pool4) 

    return output[:, :, 31: (31 + h), 31: (31 + w)].contiguous()

`

What exactly is happening here ? Also, how many outputs does the model have ? It's supposed to output the blobs and it also is supposed output the locations of the detected objects right ?

I'm getting confused as to what the output of the model is and how exactly the blobs are handled (Apologies if I'm getting back to the same thing again, I'm having a touch time wrapping my head around this).

Jan 29 '19 06:01 gauthsvenkat

The segmentation part you showed is the upsampling path which combines different features from VGG16 to output a K x H x W matrix where K is the number of classes and H and W are the image width and height. This procedure is described more fully in the first deep-based segmentation paper as FCN8: https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf

At this point, there are no blobs, just activations for each pixel for each class. Hope this helps.

Jan 29 '19 14:01 IssamLaradji

No, I mean more specifically, what does

score_pool4c = score_pool4[:, :, 5:5+upscore2.size(2), 5:5+upscore2.size(3)] Why 5, 9 and 31 ?

And how are blobs computed from this again ?

Jan 30 '19 00:01 gauthsvenkat

The 5, 9, and 31 are there to take care of the shifting due to the max-pooling layers of VGG16.
Once you get the output above (which is K x H x W) , you apply the following two functions to get the blobs,

# Get the labels across the channels 'K' which is the number of classes
pred_mask = output.argmax(1).squeeze().cpu().numpy()

# Get the blobs for category 'k' as follows:
blobs_k = morph.label(pred_mask==k)

Feb 05 '19 14:02 IssamLaradji

Hey, I'd been looking at your code for a while now and zeroed in on the parts I don't understand (I have other parts I don't understand either but it requires me to understand the former first).

In get_blob_dict in losses.py `

 blob_uniques, blob_counts = np.unique(class_blobs * (points_mask), return_counts=True) <--- THIS
 uniques = np.delete(np.unique(class_blobs), blob_uniques)  <--- THIS

 for u in uniques: #iterate over falsely predicted blobs
     blobList += [{"class":l, "label":u, "n_points":0, "size":0,
                  "pointsList":[]}]
     n_fp += 1 

 for i, u in enumerate(blob_uniques):
     if u == 0:
         continue

     pointsList = []
     blob_ind = class_blobs==u

     locs = np.where(blob_ind * (points_mask)) <--- THIS

     for j in range(locs[0].shape[0]):
         pointsList += [{"y":locs[0][j], "x":locs[1][j]}]`

i) I'm guessing that blob_uniques now contains points that are inside the predicted blobs ? ii) I don't understand why np.delete(). I'm guessing it's to ignore the points that are inside correct blobs ? I read up on the doc for np.delete() and I don't think it's doing what it's supposed to be doing. iii) I also don't understand what np.where() is supposed to be doing ? Again the docs don't corroborate what's supposed to happening there.

In compute_image_loss in losses.py

ones = torch.ones(Counts.size(0), 1).long().cuda() BgFgCounts = torch.cat([ones, Counts], 1) Target = (BgFgCounts.view(n*k).view(-1) > 0).view(-1).float() Smax = S.view(n, k, h*w).max(2)[0].view(-1)

I don't understand how you get target values because when I tried to simulate that piece of code (with respect to trancos dataset, only two classes, background and foreground), ones and BgFgCounts don't have the same number of dimensions. Also is there any reason you flatten Target twice ?

In compute_fp_loss() in losses.py

T = np.ones(blobs.shape[-2:]) #FLAG T[blobs[b["class"]] == b["label"]

I'm not entirely sure what's happening here either.

I understand a lot of these questions might be very trivial 😅 but you have no idea how much your help is appreciated. Thanks again 😃.

Mar 02 '19 14:03 gauthsvenkat

i. blob_uniques corresponds to all the unique blobs that intersect with the point-annotations. The false positive blobs are those that do not intersect with the point-annotations, which is what uniques = np.delete(np.unique(class_blobs), blob_uniques) gives us.

ii. numpy.delete(arr, obj, axis=None) deletes the values in obj from arr. Like you said, it's to ignore the points that are inside correct blobs.

iii. np.where() returns the x- and y- coordinates for the points. In this case, locs = np.where(blob_ind * (points_mask)) returns the locations of the point-annotations that intersect with the predicted blobs.
i. Trancos only has two classes. BgFgCounts consists of the background label (which is 1 for all datasets) and whether other objects exist. Since Trancos images always have a car in them, BgFgCounts=[1,1] for all images.

ii. There is no reason for flattening Target twice, you are right.
For the compute_fp_loss, b is the binary matrix corresponding to the blob with no points ; i.e. b["n_points"] == 0 which is a false positive blob. Therefore, T[blobs[b["class"]] == b["label"]] = 0 gets the indices where the false positive blob is and sets it to zero. Then F.nll_loss(S_log, torch.LongTensor(T).cuda()[None],...) sets those blobs to background using the cross entropy loss.

Hope this helps!

PS: In case it's helpful, you can put a breaking point as import ipdb; ipdb.set_trace() at any line and observe how the arrays are behaving across program.

Mar 03 '19 16:03 IssamLaradji

Thanks a lot! I didn't know about ipdb, I'll make use of it.

Mar 04 '19 02:03 gauthsvenkat

Hey, Just to clarify, In models.py, ResFCN, You're first reducing the size using interpolate and then in the end you're increasing the size in the last interpolate? Cause I'm guessing logits_16s_spatial_dim will be smaller than that of 32s and 8s spatial dimensions would we smaller than that of 16s ?

Mar 25 '19 15:03 gauthsvenkat

logits_32s is the original image size divided by 32.
logits_16s is the original image size divided by 16.
logits_8s is the original image size divided by 8. So logits_8s is the largest one.

The interpolate code you showed above is the upsampling path of the network.

logits_32 gets resized to the size of logits_16, and then added to logits_16;
logits_16 gets resized to the size of logits_8, and then added to logits_8; and finally
logits_8 gets resized to the size of the original image.

Mar 25 '19 15:03 IssamLaradji

Ah, yes, I got it. Got a bit confused with the variable names themselves cause I initialized them wrong. Thanks a lot!

Mar 25 '19 15:03 gauthsvenkat

I've ran into a few more questions that unfortunately ipdb couldn't answer (Thanks a lot for that btw, it was very helpful).

isn't false positive loss basically a part of the point level loss when it is calculated the first time around ? Meaning you are penalizing the network twice as much for false positives isn't it ?
Also I don't quite get the places where you've used blobs[b['class']] == b['label'] Okay I mostly understand this as you're getting the blobs of a particular class and then working with them. Took me a while to figure this out. So in trancos b['class'] will always be 0 right? Since there is only one other class besides background ?
I don't have a formal education in digital image processing so this one might be a bit trivial. What does black_tophat in watersplit() do exactly (with respect to the probabilities) ? I read up on it and it seems to be a way to contrast important objects but I see you're using it with the probabilities and I'm not sure what to make of it.

So what I understand from split_level_loss is that you're getting the boundaries between objects (inside the blob) and then setting them to background (0) and then performing nll_loss against the output of the network.

Please correct me if I am wrong.

Mar 31 '19 03:03 gauthsvenkat

LCFCN LCFCN copied to clipboard

How does the loss function work and how is it actually implemented in Torch ?

LCFCN
LCFCN copied to clipboard