chilitags icon indicating copy to clipboard operation
chilitags copied to clipboard

Improve tag stability, counter loss and flipping

Open qbonnard opened this issue 10 years ago • 18 comments

I haven't investigated yet, but the Z-axis seems to flip on estimate3d-gui, especially when the camera is almost perpendicular to the tag.

This issue is a reminder to investigate more ;)

qbonnard avatar Feb 26 '14 10:02 qbonnard

I've noticed it as well. This is definitely a bug. If I'm correct, the Z axis should be on the contrary very stable.

severin-lemaignan avatar Feb 26 '14 10:02 severin-lemaignan

I suspect it has to do with the order of the corners... Guilty until proven innocent.

qbonnard avatar Feb 26 '14 11:02 qbonnard

Sounds like a good suspect, indeed.

severin-lemaignan avatar Feb 26 '14 11:02 severin-lemaignan

Sorry to give bad news guys, but I think I've figured it out. It's probably the same phenomenon as the famous upside-down optical illusion (or whatever it's called). I think this picture summarizes the situation perfectly: http://www.mindmotivations.com/images/optical-illusion1.jpg The order of the corners is the same in both up and down cases.

I guess the only way to solve this problem with a single tag is to determine the perspective of the tag good enough so that we can distinguish whether the "up" corner is closer to us than the "down" corner. Currently the only way to do this is to determine lines that are between the left-up corners and right-up corners are longer or shorter than the lines that are between the left-down corners and right-down corners. This becomes more and more noise-prone as the tag gets flatter and flatter in the perspective, i.e all lines get shorter, hence the flipping we observe. Please observe that the flipping diminishes and then stops when you get the camera closer and closer to the tag, and doesn't happen at all when the tag is "looking towards" the camera, i.e not too flat.

In my application, I'm planning to solve this issue by using multiple tags that are fixed referenced beforehand among themselves and all referenced to the camera + outlier detection and elimination.

ayberkozgur avatar Apr 17 '14 12:04 ayberkozgur

There is no bug as we're currently doing nothing wrong. This will be more of an enhancement if we achieve to solve this some other way, so I'm changing the labels.

ayberkozgur avatar Apr 17 '14 13:04 ayberkozgur

Hum, interesting.

Considering the following tag:


           A
             ,'._
            /    `._
           /        `._
         ,'            `._
        /                 `.
       /                    `-.  D
      /                        ;-
    ,'                        /
   _                        ,'
 B  `._                    /
       `.                ,'
         `-.           ,'
            `-.       /
               `-.  ,'
                  `` C

What about the computing the cross-product of AB and AD to check if the angle is smaller or larger than the cross-product of CB and CD to check which one is closer to us?

severin-lemaignan avatar Apr 22 '14 15:04 severin-lemaignan

I drew some geometric diagrams to convince myself. If my reasoning is right, this is due to slight misdetection of corner locations on the screen (e.g due to pixel resolution/lighting etc.) which would result the same no matter which method of calculation we use. Further, this should result in calculating the following cross product to find the +Z axis: (B_world_coordinate - A_world_coordinate)x(D_world_coordinate - A_world_coordinate). This really amounts to getting the A, B, D world coordinates right, which should be already done by taking that cross product.

But, if there is misdetection only on the ABD triangle and the BDC triangle is clean (e.g due to C being closer or getting more light on it somehow), we can use the (B_world_coordinate - C_world_coordinate)x(D_world_coordinate - C_world_coordinate) cross product instead. And, if there is misdetection in the ABD triangle, A should look closer to us than B and D. In addition, BDC is clean, so C also appears closer to us than B and D. So, it might just be the case that A appears even closer than C, making our ABxAD vs. CBxCD check useless.

I currently see two "solutions":

  • Take all 4 possible cross products and vote/weighted average according to some confidence metric/take one or more and discard others according to some metric
  • Check order of closeness of corners to the camera, if it is such that A > C > B ~ D (the above situation, ~ denotes order doesn't matter) report error or do not report tag at all

If it is the case that the corners are such that C > B ~ D > A where the actual order is A > B ~ D > C, we're pretty much out of luck.

ayberkozgur avatar Apr 30 '14 08:04 ayberkozgur

Please note that this whole issue is also caused by the actual bending of the paper. On second thought, the major culprit is probably the bending of the paper, which means that the above method (voting 4 corners) could actually work.

ayberkozgur avatar Apr 30 '14 11:04 ayberkozgur

After discussion, the current proposal is:

  • single marker: return is only if we are confident about the perspective
  • more than one tag: select the Z direction of each tags such as the camera position is consistent between all tags (consensus on the camera position)

severin-lemaignan avatar May 16 '14 14:05 severin-lemaignan

Suits me well.

ayberkozgur avatar May 16 '14 14:05 ayberkozgur

FYI, my initial experiments on the Cellulo side suggest that median filtering on a time window is very robust against flipping.

ayberkozgur avatar May 21 '14 17:05 ayberkozgur

Very interesting... That could very well replace the average filtering, and "fix" the z-flipping issue more simply than the proposal above. The advantage is that it works also with a single tag, the disadvantage is that it needs a few frames... which is OK, because the tag flipping means that there are several frames already.

So you just take a median of the last values for each component of the transformation matrix, or is it a bit fancier ? How big is your time window ?

qbonnard avatar May 22 '14 20:05 qbonnard

It is a bit fancier :) Here is what I do:

Get the translations (3-vector) and rotations (quaternion) in their own windows and calculate their respective medians. Median in more than one dimensions is defined as a "geometric median" (point in space whose sum of L1 distances to the window points is the least). The catch is that geometric median is proven to have neither an explicit formula nor an exact time algorithm, but it is known that it sums up to the convex optimization of a convex function. The way I calculate them is using the Weiszfeld-Ostresh algorithm which is iterative and is basically a case of gradient descent, which might be off-putting in terms of performance. I've had runs that converged in 5 iterations and runs that converged in 20 iterations. It can be tuned by setting the initial point as the mean and playing with the step size.

This works well for 3-vectors but you need to express quaternions as points in a Riemannian manifold for it to work. This is a bit beyond my mathematical knowledge, but it turns out to be again the convex optimization of a convex function. You only need special treatment for them, such as having different distance measures and different maps.

Once you have both medians, stick them into a transform matrix and you're good to go. You can find my implementations in here (there are also references to the papers I got the algorithms from): https://github.com/ayberkozgur/libgdx/blob/master/gdx/src/com/badlogic/gdx/math/Quaternion.java https://github.com/ayberkozgur/libgdx/blob/master/gdx/src/com/badlogic/gdx/math/Vector3.java https://github.com/ayberkozgur/libgdx/blob/master/gdx/src/com/badlogic/gdx/math/Matrix4.java

ayberkozgur avatar May 23 '14 07:05 ayberkozgur

By the way, I used a 10 sample window, but I think it can be lowered a bit more.

And, this can also be applied to the scale (3-vector) of a transform matrix. Scale doesn't make sense in the Chilitags world but I just wanted to put it out there.

ayberkozgur avatar May 23 '14 07:05 ayberkozgur

Time for this issue to rise from the dead:

During the NCCR meetings, I had the chance to talk to a guy from ETHZ's Agile and Dexterous Robotics Lab who implemented a similar tag-based application (based on another tag library). They are using a Kalman filter on the tag pose + IMU data when available in order to counter flipping issues as well as the loss of the tag due to blurry camera image. He said that they are getting very good results from this. I think @severin-lemaignan also mentioned trying a Kalman filter at some point. He said the code was open source too. We should definitely look at this at some point, it will be very cheap to calculate.

The exact same thing goes for chilitrack: https://github.com/chili-epfl/chilitrack/issues/4

Also changing the name to reflect the issue better.

ayberkozgur avatar Oct 26 '14 18:10 ayberkozgur

Related issue in OpenCV: https://github.com/opencv/opencv/issues/8813

And here's a workaround.

valtron avatar May 20 '18 07:05 valtron

Thanks for the tip :)

qbonnard avatar May 21 '18 21:05 qbonnard

Any tips how to fix this after my homogeneous transform was created? I have a strong constraint that my pattern faces into a certain direction, so detection the "flipping" is not a problem.

I would like to transfer this: image

to this: image

My biggest problem is that translation is also affected by the flipping.

andraspalffy avatar Jun 06 '18 15:06 andraspalffy