chilitags icon indicating copy to clipboard operation
chilitags copied to clipboard

Returned tags should have confidence values

Open ayberkozgur opened this issue 10 years ago • 9 comments

Returned tags should have confidence values (e.g between 0 and 1) just like the skeleton joint confidences returned by OpenNI skeleton tracker for Kinect. This would be very useful on the user side where we can e.g decide to trust some tag more than another in a multi-tag setting.

It should be according to some metric or a combination of multiple metrics, some ideas are:

  • Pixel colors on top of tag bits indicate confidence, i.e black/white indicate high confidence while gray tones indicate low confidence etc.
  • Give tags with lower visual area lower confidence
  • Give lower confidence to tags that have lower visual area than a certain threshold
  • Give flatter tags, i.e tags that are seen from a narrower angle, lower confidence

ayberkozgur avatar Apr 29 '14 13:04 ayberkozgur

Makes sense. Another metrics would be the gradient value used for the corner detection.

What is your use case ?

qbonnard avatar Apr 29 '14 14:04 qbonnard

I have multiple tags that represent landmark objects, they lie on the table in a predefined configuration and do not move. When the camera sees multiple tags, it will set the virtual GL camera's transform to the inverse transform of the most confident tag (or a weighted average of all seen tags etc.). This way, I will be able to build a scene model and implement all sorts of things, e.g augmented 3D models, physics etc.

ayberkozgur avatar Apr 29 '14 14:04 ayberkozgur

In this case, I think it would be better to use Chilitags3D::readTagConfiguration to have a scene object whose transform is estimated out of multiple tags. The confidence would be nice to weight the multiple tags in this estimation (internally). Have you tried Chilitags3D::readTagConfiguration already?

qbonnard avatar Apr 29 '14 14:04 qbonnard

I know the existence of tag configurations but I will also have tags that move, and I have to be able to dynamically add/remove these landmarks (simulating the destruction of augmented objects). This is why I can't take them as a single object. Also, in the (not very near) future, I am planning to estimate the landmark configuration via solving the particular SLAM problem, eliminating the predefined landmark configuration altogether.

In addition, I think once the confidence values are calculated, there should be no reason why we shouldn't export it to the user along with the tag transforms instead of keeping it internal.

ayberkozgur avatar Apr 29 '14 15:04 ayberkozgur

I was just checking that the readTagConfiguration method was not too buried ;) Sure, the confidence values make sense. For the dynamic modification of the landmark, would it help to have a setTagConfiguration method that would allow the modification of the tag configuration without having to use a file ? I think that's missing anyway...

qbonnard avatar Apr 29 '14 15:04 qbonnard

Yes, in fact once I think about it, setting the tag configuration dynamically makes sense for the landmark job.

ayberkozgur avatar Apr 29 '14 15:04 ayberkozgur

Noted: https://github.com/chili-epfl/chilitags/issues/42

qbonnard avatar Apr 29 '14 15:04 qbonnard

Returning confidence seems interesting for advanced applications, indeed, but we also want to keep one "simple" API (aka detect tags in 2 obvious lines). So, either we add a findWithConfidence or we do some template magic on the return type to provide the confidence only when needed.

severin-lemaignan avatar Apr 30 '14 06:04 severin-lemaignan

This is important: Tags whose transforms change a lot over time (e.g tag flips each couple of frames) should have drastically low confidence values. In any case, I will be implementing this functionality in my application. Currently, I calculate the "spread" of the sample transform batch, which is a weighted sum of traces of the translation sample covariance matrix (3x3) and rotation sample covariance matrix (4x4 since quaternions). The sample batch is e.g the last 30 values of the tag transform.

Of course, there are other metrics who use the whole covariance matrices as well, but I used this method and I found it satisfactory. It's not expensive to calculate since only diagonal values of the cov. matrices must be calculated. See the code I'm currently using at https://github.com/chili-epfl/cellulo/blob/master/core/src/ch/epfl/chili/cellulo/math/util/TransformSampleBatch.java.

By the way, I don't see any harm in returning as much information as possible resulting from a 3D detection in a struct (as long as you can turn them off with flags during object creation or during runtime for performance concerns). The detect tags in 2 obvious lines code won't even change by 1 character and the detection result will be tag.second.transform instead of tag.second, which is more explicit and more readable if you ask me. And, if these extra calculations (e.g confidence) are turned off by default, neither the API complexity nor the default performance will change for the simple user.

ayberkozgur avatar May 20 '14 17:05 ayberkozgur