BreakoutDetection icon indicating copy to clipboard operation
BreakoutDetection copied to clipboard

Question regarding ForwardUpdate

Open torfsen opened this issue 10 years ago • 3 comments

I'm currently going through the source and I'm having trouble to understand how interval tree A is updated in ForwardUpdate: As far as I understand it, the idea is to move the location candidate one step to the right (Line 198) and then update the trees by removing distances related to the point on the left which is now outside the window and by adding distances related to the point on the right which is now inside the window.

Since the window for tree A contains min_size points, I think we should add and remove min_size - 1 distances from tree A, respectively. However, as far as I understand lines 201-219, the code actually does the following:

  • Add the min_distance - 1 distances from Z[tau1 - min_size] - Z[tau1 - 1] through Z[tau1 - 2] - Z[tau1 - 1] (inclusive). So far so good.
  • Remove the min_distance (!) distances from Z[tau1 - min_size] - Z[tau1 - min_size - 1] through Z[tau1 - 1] - Z[tau1 - min_size - 1] (inclusive). I don't think that last distance should be removed here, since Z[tau1 - 1] is the new point and Z[tau1 - min_size - 1] is the point that was just dropped.
  • Add the single distance Z[tau1 - min_size - 1] - Z[tau1 - min_size]. I don't think that's correct: this is a distance related to the point we just dropped, so it should be removed. In addition, we just did remove it in the previous step!

I therefore think that the loop in line 208 should have the limit i < tau1 - 1 (just as the loop before it) and that the addition of the single distance after it (lines 215-219) should be removed.

Is this reasoning correct or did I miss something?

torfsen avatar Oct 23 '15 21:10 torfsen

Sorry about the delayed response. I've been rather busy these last few weeks.

I agree with you analysis in the first two points. However, as mentioned in the third we add back the point that was previously (incorrectly) removed. So I don't see where the issue is. I think the reason it was implemented this way was for ease of understanding. If the loop limits were changed it would result in a small computational improvement but it would make matching up the code and the corresponding parts of the research paper more difficult.

It has been a while since I've looked at this project but I am pretty sure that was our reasoning. Hopefully this was able to answer your question.

On Fri, Oct 23, 2015 at 2:12 PM, Torf [email protected] wrote:

I'm currently going through the source and I'm having trouble to understand how interval tree A is updated in ForwardUpdate https://github.com/twitter/BreakoutDetection/blob/master/src/edmTail.cpp#L194: As far as I understand it, the idea is to move the location candidate one step to the right (Line 198 https://github.com/twitter/BreakoutDetection/blob/master/src/edmTail.cpp#L198) and then update the trees by removing distances related to the point on the left which is now outside the window and by adding distances related to the point on the right which is now inside the window.

Since the window for tree A contains min_size points, I think we should add and remove min_size - 1 distances from tree A, respectively. However, as far as I understand lines 201-219 https://github.com/twitter/BreakoutDetection/blob/master/src/edmTail.cpp#L201-L219, the code actually does the following:

  • Add the min_distance - 1 distances from Z[tau1 - min_size] - Z[tau1
  • 1] through Z[tau1 - 2] - Ztau1 - 1. So far so good.
  • Remove the min_distance (!) distances from Z[tau1 - min_size] - Z[tau1 - min_size - 1] through Z[tau1 - 1] - Ztau1 - min_size - 1. I don't think that last distance should be removed here, since Z[tau1
  • 1] is the new point and Z[tau1 - min_size - 1] is the point that was just dropped.
  • Add the single distance Z[tau1 - min_size - 1] - Z[tau1 - min_size]. I don't think that's correct: this is a distance related to the point we just dropped, so it should be removed. In addition, we just did remove it in the previous step!

I therefore think that the loop in line 208 https://github.com/twitter/BreakoutDetection/blob/master/src/edmTail.cpp#L208 should have the limit i < tau1 - 1 (just as the loop before it) and that the addition of the single distance after it (lines 215-219 https://github.com/twitter/BreakoutDetection/blob/master/src/edmTail.cpp#L215-L219) should be removed.

Is this reasoning correct or did I miss something?

— Reply to this email directly or view it on GitHub https://github.com/twitter/BreakoutDetection/issues/18.

  • Nicholas James

putnam120 avatar Oct 26 '15 15:10 putnam120

Thanks for your quick response, Nicholas! I still think that there's a bug, though, because the problematic distance in step 2 is not the same as the one in step 3:

The code removes Z[tau1 - 1] - Z[tau1 - min_size - 1] (step 2) but doesn't remove Z[tau1 - min_size - 1] - Z[tau1 - min_size] (or, more precisely, the distance is removed in step 2 and added back in step 3). Both of these are incorrect, as far as I understand: Z[tau1 - 1] - Z[tau1 - min_size - 1] hasn't been added before and should therefore not be removed, and Z[tau1 - min_size - 1] - Z[tau1 - min_size] refers to the dropped point Z[tau1 - min_size - 1] and therefore should be removed.

Please forgive my persistence, but the paper is a bit fuzzy on the details here so I'm trying to make sure I understand it correctly.

torfsen avatar Oct 26 '15 15:10 torfsen

This must be a bug. It looked fishy to me as well was happy to find a discussion already in the issue.

utkrist avatar Jul 09 '16 14:07 utkrist