ThreatExchange
ThreatExchange copied to clipboard
tmk-query false positives
Hello again!
After extensive testing, I'm running into a lot of false positive results across many different videos using the tmk-query
tool.
Here is a link to a public github repo I created with one such example. I've included two different video files and their respective tmk hashes.
After hashing these two separate videos, I'm comparing my needles and haystack set with:
./tmk-query --c1 0.7 --c2 0.7 needles.txt haystack.txt | sort -n
The result is:
0.828915 0.790789 clip_65_418_430.tmk video.tmk
Could you provide any insight into why these videos are matching? Am I missing something when creating hashes or comparing with the tmk-query
tool? I can provide more examples if needed.
Thank you!!
@johnkerl do you have any idea as to why i'm getting false positive matches with these two completely different videos?
@jcohenho thank you -- first thought is perhaps the ~0.8 tolerance zone was too loose.
I'm excited to hear about the extensive testing! Can you share some information about the level-1 and level-2 scores for more of your false-positive cases?
Hi @johnkerl,
Just to clarify, when I run tmk-query
, the first two columns are the level-1 and level-2 scores that correspond to the --c1
and --c2
arguments correct? Do you just want to see more results from my tmk-query against the library of videos I'm testing with?
Here is a longer list:
0.701424 0.724250 ./needles/clip_63_394_409.tmk ../../haystack/4304762620.tmk
0.701482 0.729434 ./needles/clip_65_418_430.tmk ../../haystack/4304762620.tmk
0.701586 0.727049 ./needles/clip_62_378_394.tmk ../../haystack/8545625950.tmk
0.708319 0.709165 ./needles/clip_59_357_362.tmk ../../haystack/9876092082.tmk
0.708781 0.703152 ./needles/clip_65_418_430.tmk ../../haystack/1705689899.tmk
0.709351 0.707681 ./needles/clip_90_456_466.tmk ../../haystack/3132932351.tmk
0.710304 0.706610 ./needles/clip_90_456_466.tmk ../../haystack/7488560177.tmk
0.710640 0.706276 ./needles/clip_65_418_430.tmk ../../haystack/7403319953.tmk
0.711533 0.709757 ./needles/clip_63_394_409.tmk ../../haystack/6824241941.tmk
0.712311 0.736267 ./needles/clip_65_418_430.tmk ../../haystack/9637086807.tmk
0.712575 0.729686 ./needles/clip_65_418_430.tmk ../../haystack/9537219712.tmk
0.713084 0.703157 ./needles/clip_65_418_430.tmk ../../haystack/5288489068.tmk
0.713765 0.705212 ./needles/clip_90_456_466.tmk ../../haystack/7201179539.tmk
0.718981 0.723688 ./needles/clip_16_140_153.tmk ../../haystack/5540279788.tmk
0.719635 0.702121 ./needles/clip_65_418_430.tmk ../../haystack/8794779068.tmk
0.721018 0.714752 ./needles/clip_59_357_362.tmk ../../haystack/1689354702.tmk
0.722227 0.729621 ./needles/clip_59_357_362.tmk ../../haystack/8953057686.tmk
0.722591 0.713023 ./needles/clip_65_418_430.tmk ../../haystack/3883036589.tmk
0.722637 0.713299 ./needles/clip_37_237_246.tmk ../../haystack/1639206967.tmk
0.723007 0.735340 ./needles/clip_90_456_466.tmk ../../haystack/9637086807.tmk
0.725125 0.715759 ./needles/clip_106_664_684.tmk ../../haystack/2661278471.tmk
0.725164 0.736653 ./needles/clip_65_418_430.tmk ../../haystack/0481629324.tmk
0.729034 0.700245 ./needles/clip_90_456_466.tmk ../../haystack/1735605575.tmk
0.729100 0.745117 ./needles/clip_90_456_466.tmk ../../haystack/5347345302.tmk
0.729102 0.727861 ./needles/clip_65_418_430.tmk ../../haystack/8748537131.tmk
0.729626 0.720933 ./needles/clip_59_357_362.tmk ../../haystack/7148697053.tmk
0.732095 0.736921 ./needles/clip_62_378_394.tmk ../../haystack/5940883728.tmk
0.733803 0.731598 ./needles/clip_65_418_430.tmk ../../haystack/0970735518.tmk
0.734619 0.725880 ./needles/clip_65_418_430.tmk ../../haystack/2271872409.tmk
0.734865 0.725444 ./needles/clip_106_664_684.tmk ../../haystack/3113745105.tmk
0.735034 0.724343 ./needles/clip_38_246_263.tmk ../../haystack/9696198039.tmk
0.736024 0.713466 ./needles/clip_59_357_362.tmk ../../haystack/6367414579.tmk
0.736624 0.720755 ./needles/clip_38_246_263.tmk ../../haystack/0728610054.tmk
0.739576 0.757686 ./needles/clip_59_357_362.tmk ../../haystack/8712928242.tmk
0.741155 0.721292 ./needles/clip_38_246_263.tmk ../../haystack/4304762620.tmk
0.741983 0.707826 ./needles/clip_4_27_44.tmk ../../haystack/1639206967.tmk
0.742138 0.705701 ./needles/clip_65_418_430.tmk ../../haystack/5209228963.tmk
0.742232 0.734402 ./needles/clip_65_418_430.tmk ../../haystack/5347345302.tmk
0.746027 0.724908 ./needles/clip_90_456_466.tmk ../../haystack/4702449382.tmk
0.746755 0.761313 ./needles/clip_65_418_430.tmk ../../haystack/6779475805.tmk
0.747831 0.725664 ./needles/clip_65_418_430.tmk ../../haystack/7550398664.tmk
0.748039 0.749356 ./needles/clip_59_357_362.tmk ../../haystack/1138692140.tmk
0.748788 0.726332 ./needles/clip_65_418_430.tmk ../../haystack/7195434310.tmk
0.750477 0.746034 ./needles/clip_65_418_430.tmk ../../haystack/3039349834.tmk
0.756347 0.766993 ./needles/clip_65_418_430.tmk ../../haystack/9966117635.tmk
0.757652 0.728766 ./needles/clip_65_418_430.tmk ../../haystack/7073305196.tmk
0.759797 0.749851 ./needles/clip_65_418_430.tmk ../../haystack/4702449382.tmk
0.760223 0.769756 ./needles/clip_59_357_362.tmk ../../haystack/3206890901.tmk
0.760658 0.759999 ./needles/clip_65_418_430.tmk ../../haystack/4736217465.tmk
0.761135 0.754722 ./needles/clip_59_357_362.tmk ../../haystack/4993711965.tmk
0.764817 0.787628 ./needles/clip_106_664_684.tmk ../../haystack/2610938319.tmk
0.765291 0.729330 ./needles/clip_59_357_362.tmk ../../haystack/3224352717.tmk
0.768169 0.787347 ./needles/clip_65_418_430.tmk ../../haystack/1639206967.tmk
0.769148 0.752169 ./needles/clip_65_418_430.tmk ../../haystack/5561811386.tmk
0.769168 0.734276 ./needles/clip_4_27_44.tmk ../../haystack/4304762620.tmk
0.770742 0.765770 ./needles/clip_65_418_430.tmk ../../haystack/8545625950_k.tmk
0.772002 0.747713 ./needles/clip_65_418_430.tmk ../../haystack/6868044816.tmk
0.773601 0.757714 ./needles/clip_65_418_430.tmk ../../haystack/3071884561.tmk
0.774219 0.768757 ./needles/clip_65_418_430.tmk ../../haystack/4733220414.tmk
0.779221 0.742252 ./needles/clip_65_418_430.tmk ../../haystack/2735792862.tmk
0.784012 0.737720 ./needles/clip_37_237_246.tmk ../../haystack/8794779068.tmk
0.788096 0.753398 ./needles/clip_65_418_430.tmk ../../haystack/5723468857.tmk
0.788848 0.783313 ./needles/clip_59_357_362.tmk ../../haystack/8079327790.tmk
0.790015 0.735624 ./needles/clip_90_456_466.tmk ../../haystack/4736217465.tmk
0.792202 0.767012 ./needles/clip_65_418_430.tmk ../../haystack/7522532846.tmk
0.793449 0.721210 ./needles/clip_37_237_246.tmk ../../haystack/0338388076.tmk
0.793711 0.749790 ./needles/clip_65_418_430.tmk ../../haystack/0484671290.tmk
0.798029 0.709030 ./needles/clip_37_237_246.tmk ../../haystack/9966117635.tmk
0.804429 0.799744 ./needles/clip_65_418_430.tmk ../../haystack/5940883728.tmk
0.805439 0.773944 ./needles/clip_65_418_430.tmk ../../haystack/1925738538.tmk
0.806802 0.791950 ./needles/clip_59_357_362.tmk ../../haystack/0134115277.tmk
0.807566 0.775979 ./needles/clip_65_418_430.tmk ../../haystack/4470743183.tmk
0.811182 0.744237 ./needles/clip_4_27_44.tmk ../../haystack/0481629324.tmk
0.817829 0.790668 ./needles/clip_65_418_430.tmk ../../haystack/2124707866.tmk
0.825436 0.761574 ./needles/clip_90_456_466.tmk ../../haystack/3071884561.tmk
0.826490 0.826427 ./needles/clip_59_357_362.tmk ../../haystack/6626160849.tmk
0.828915 0.790789 ./needles/clip_65_418_430.tmk ../../haystack/6325635370.tmk
0.828915 0.790789 ./needles/clip_65_418_430.tmk ../../haystack/8310293810.tmk
0.854163 0.790387 ./needles/clip_5_44_51.tmk ../../haystack/2341324345.tmk
0.854188 0.790400 ./needles/clip_5_44_51.tmk ../../haystack/8512702394.tmk
Only the last two rows are correct matches, maybe I need to set my threshold higher? I'm working with a library of over 100,000 videos so I have a lot of content to work with :)
@jcohenho yes I would set the threshold higher. I think your evaluation set is larger than ours was for this project. The evaluation sets we used internally were (a) one smaller, public/general-content one, and (b) one larger, domain-specific dataset. This is really great info to have, adding another (large) dataset! :)
This issue is being marked as stale because it has no recent activity. It will be closed automatically in 14 days unless it becomes active before then. To prevent closing, please comment on the issue before that time. If the issue is no longer relevant, please feel free to close it prior to that time.
Cleaning up stale issues helps redirect focus to the issues top of mind of the community. Thank you for your help with this.
This issue has been closed due to no recent activity. If you need this issue reopened, please let us know. Thanks!
This issues got referenced in https://www.hackerfactor.com/blog/index.php?/archives/971-FB-TMK-PDQ-WTF.html with a longer writeup with practical results.
I'm from after the time when we evaluated TMK, but we may want to update guidance on thresholds or more emphasis on tuning the thresholds for desired precision/recall.