pdq_evaluation icon indicating copy to clipboard operation
pdq_evaluation copied to clipboard

Help in debugging an error

Open tjiagoM opened this issue 4 years ago • 5 comments

Hi!

@david2611 I know I still have the PR to conclude, but in the meantime I've been hitting my head on the keyboard with a bug which I was still not able to solve. I isolated one bounding box in which this problem happens to be easier to explain here.

The error is being thrown in gen_single_heatmap() from data_holders.py when my input to that function is:

img_size = (333, 500) mean = [-1.0, 47.51200000000006] cov = array([[0. , 0. ], [0. , 0.00176]])

The calculations in these function will make positions to be completely empty and with shape (0, 1, 2), which when passed to g.cdf throws the following exception: ValueError: Cannot apply_along_axis when any iteration dimensions are 0

I'm not sure what is the objective of this function, so I'd appreciate if you could give me at least some pointers on what you are trying to do here and what this matches in the PDQ paper?

If it helps, the PBoxDetInst on which the calculations are being done before calling gen_single_heatmap() has the following attributes: covs: array([[[1.76000001e-03, 2.52979994e-01], [2.52979994e-01, 1.28846497e+02]], [[1.76000001e-03, 0.00000000e+00], [0.00000000e+00, 0.00000000e+00]]])

box: [337.292, 302.658, 451.48799999999994, 333.0]

Initially I thought that the bounding box I was trying to predict had the coordinate x1 over the limits of the image (box[0] > img_size[0]) but I was wrong, as img_size is (height, width).

So, I'm really lost here and I don't seem to be able to solve this, any help would be really very much appreciated, for example helping me understanding what is being calculated here and what it is for.

In any case, do you have any idea why this might be happening or where the issue might be? :/ The code before was working perfectly fine when I had a deterministic model without any covariates, but after trying to change it to have mcdropout, this started to happen.

tjiagoM avatar Jul 24 '20 21:07 tjiagoM

Hi @tjiagoM ,

just taken a quick glance at it (I haven't run any experiments to test my thinking but hopefully this gets you on the right track).

First w.r.t. the PDQ paper, this is part of the process in generating the probabilistic detection heatmap like you would see in figure 1 and is outlined in section 4. However, the process is a little obtuse to understand because of steps we have had to make in the name of optimization.

The goal is to create 2 cumulative distribution probability heatmaps for each corner and then multiply those two heatmaps together to create the final probability distribution. The step you are currently in (gen_single_heatmap) is creating one of the cumulative distribution heatmaps. This heatmap is essentialy giving the probability that a point in the image would be inside a box with a corner generated by the Gaussian defined by the input to the function.

From your error I would think that the issue is the values of positions you are getting which (given the array is empty) would be screwing things up. Which would mean to me that possibly the values you are getting from find_roi are invalid or unusual. The goal of find_roi is an optimization step where we try to find region of interest wherein the cumulative probabilities will change most (we chose a rectangular region with pixels all within a Mahalnobis distance of at least 4). We do this to reduce the number of pixels subjected to the g.cdf function which can take a while.

What I suspect may be happening is that you have a mean just outside the image (-1) and there is a region of interest which exists entirely outside the image boundary and that may be screwing with positions and then that in turn messes with everything else.

I can't say for certain without digging deeper but I hope this sets you on the right path. If you are still having issues feel free to message again but my response will be more delayed than this quick note of my thoughts

david2611 avatar Jul 25 '20 02:07 david2611

Thank you so much @david2611, that explanation really helped! The main issue was indeed related to the fact that in the bounding box of my example (and a few others), my operations were putting some coordinates exactly on the edge of the image (in my example pixel 333 in one axis), so that coordinate should be with index 332 instead of 333. After I corrected the way I scale images, this avoided the "-1" you mentioned.

Now the issue is that some of my covariance matrices are not positive semidefinite, so I'm getting an error when gen_single_heatmap tries to create the multivariate gaussian density function. My final question here is that if you had a similar issue in your paper's experiments when calculating the mean/covariances from the sampled bounding boxes in the MC-Dropout SSD model. I do recognise that this is a symptom that my model needs some further fine-tuning, but I'd guess this might happen once in a while with mcdropout? I'm curious to know whether you had this type of issues and if yes what was usually your approach?

Thanks once again for your time!

tjiagoM avatar Jul 26 '20 09:07 tjiagoM

I would like to add something else on this issue. For instance, I think that a coordinate of exactly 333 (like in my example) should not be considered incorrect (-1 like in my example). Given that the coordinates will come in float format, a value of 0 will mean the top-left corner of a pixel, and a value of 1 (width) will mean the top-right corner of that same pixel. Thus, if my image has 333 pixels in width, a coordinate of 333 should not be incorrect, given that theoretically the bounding box goes around the pixels. In other words, if my image has width of 1, a x1=0 would mean the left of the pixel, and x1=1 would mean the right of the pixel. I know for practical reasons we need to print them in actual pixels, but their actual semantic value, given they come as float, is not that.

tjiagoM avatar Jul 26 '20 19:07 tjiagoM

Thanks again for your comments and apologies for the exceedingly delayed responses from me. You make a very good point and fence post issues came up frequently when we first came up with this code.

At the time I believe we considered the thought that the bounding box was the coordinate of the extremes of an object inclusively. In the case of a single pixel image, the box would have coordinates (0,0) and (0,0) meaning that the object included but did not go beyond the pixel (0,0) in any direction.

I will concede though that this is a bit of a conceptual issue and I should probably go back and rework things, including the treatment of probabilities that a box is generated outside of the image, but for simplicity this is how we created things at the time. I would be curious to hear if this has been a repeated issue for you in your research or whether you have solved this problem by this point.

david2611 avatar Sep 21 '20 00:09 david2611

Hi! At the end of day, given that this was happening just for a handful of cases, I ended up manually cropping bounding boxes in a way that would work given the errors I've explained before (if they were below/above the limits, I'd move them to the closest right coordinate). Initially I was trying to go deep into your code to try to solve this by myself, but as - again - it was just for a handful of cases, I went through the "easy" road.

Regarding the conceptual discussion of what coordinates should/shouldn't make sense, I just want to mention that I've used an implementation of yolov3, for which I even (mistakenly) tried to create a PR here: https://github.com/ultralytics/yolov3/pull/1414 (just in case you want to check the semantics of bboxes from a widely used yolov3 implementation).

Thanks so much for your time!!

tjiagoM avatar Sep 21 '20 19:09 tjiagoM