lancelot
lancelot copied to clipboard
triage sok function recall
see https://github.com/williballenthin/lancelot/blob/master/resources/evaluation/SoK/analyze-sok.ipynb

pick a testcase:
(env) user@hostname ~/c/l/r/e/SoK> python benchmark.py tee
lancelot vs SoK test suite
functions:
precision: 0.971
recall: 0.529
basic blocks:
precision: 0.990
recall: 0.813
instructions:
precision: 0.998
recall: 0.812
worst performing test cases:
-------- -----------------------------------
0.319658 SoK-windows-testsuite/cl_O2/tee
0.319658 SoK-windows-testsuite/cl_Ox/tee
0.320613 SoK-windows-testsuite/cl_O1/tee
0.32097 SoK-windows-testsuite/cl_Od/tee
0.737123 SoK-windows-testsuite/cl_m32_O2/tee
0.737123 SoK-windows-testsuite/cl_m32_Ox/tee
0.738281 SoK-windows-testsuite/cl_m32_O1/tee
0.738636 SoK-windows-testsuite/cl_m32_Od/tee
-------- -----------------------------------
dump the functions:
python dump_ground_truth_report.py SoK-windows-testsuite/cl_O2/tee/tee.gt.json.gz | grep function | sort > /tmp/gt-functions.txt
python dump_lancelot_report.py SoK-windows-testsuite/cl_O2/tee/tee.exe | grep function | sort > /tmp/lan-functions.txt
diff:
diff /tmp/gt-functions.txt /tmp/lan-functions.txt | head -n 30 master
2d1
< function: 0x140001010
5,7d3
< function: 0x140001400
< function: 0x140001460
< function: 0x140001484
13,18d8
< function: 0x14000161c
< function: 0x140001630
< function: 0x1400017a8
< function: 0x1400017bc
< function: 0x1400017c4
< function: 0x1400017cc
21d10
< function: 0x14000184c
24d12
< function: 0x1400019d0
27,32d14
< function: 0x140001bc4
< function: 0x140001bdc
< function: 0x140001bfc
< function: 0x140001c08
< function: 0x140001c54
< function: 0x140001c84
34,40d15
< function: 0x140001ccc
< function: 0x140001d00
< function: 0x140001d18
< function: 0x140001d40
< function: 0x140001d58
v0.3.6 9bac44dc7d789a87900e5dcaf17615b37eaf8903
lancelot vs SoK test suite
functions:
precision: 0.892
recall: 0.746
basic blocks:
precision: 0.989
recall: 0.801
instructions:
precision: 0.996
recall: 0.804
worst performing test cases:
-------- ------------------------------------
0.319658 SoK-windows-testsuite/cl_O2/tee
0.319658 SoK-windows-testsuite/cl_Ox/tee
0.320613 SoK-windows-testsuite/cl_O1/tee
0.32097 SoK-windows-testsuite/cl_Od/tee
0.322366 SoK-windows-testsuite/cl_O2/xxd
0.322366 SoK-windows-testsuite/cl_Ox/xxd
0.322727 SoK-windows-testsuite/cl_O1/xxd
0.323776 SoK-windows-testsuite/cl_Od/xxd
0.368915 SoK-windows-testsuite/cl_O2/pageant
0.370444 SoK-windows-testsuite/cl_Ox/pageant
0.373436 SoK-windows-testsuite/cl_O1/pageant
0.374484 SoK-windows-testsuite/cl_O2/puttygen
0.376033 SoK-windows-testsuite/cl_Ox/puttygen
0.382467 SoK-windows-testsuite/cl_O1/puttygen
0.392491 SoK-windows-testsuite/cl_Od/pageant
0.403837 SoK-windows-testsuite/cl_Od/puttygen
0.408605 SoK-windows-testsuite/cl_O2/puttytel
0.409165 SoK-windows-testsuite/cl_Ox/puttytel
0.414866 SoK-windows-testsuite/cl_O1/puttytel
0.430197 SoK-windows-testsuite/cl_Od/puttytel
-------- ------------------------------------
v0.4.2 7a9793979e9e129e95a77785e70179076887d54e
lancelot vs SoK test suite
functions:
precision: 0.871 (-0.02)
recall: 0.850 (+0.11)
basic blocks:
precision: 0.987 (no change)
recall: 0.885 (+0.08)
instructions:
precision: 0.995 (no change)
recall: 0.903 (+0.10)
worst performing function recall:
-------- ------------------------------------
0.540136 SoK-windows-testsuite/cl_O2/tee
0.540136 SoK-windows-testsuite/cl_Ox/tee
0.5403 SoK-windows-testsuite/cl_O1/tee
0.5403 SoK-windows-testsuite/cl_Od/tee
0.544627 SoK-windows-testsuite/cl_O2/xxd
0.544627 SoK-windows-testsuite/cl_Ox/xxd
0.545105 SoK-windows-testsuite/cl_O1/xxd
-------- ------------------------------------
worst performing function precision:
-------- ---------------------------------------
0.454656 SoK-windows-testsuite/cl_Ox/libxml2
0.456754 SoK-windows-testsuite/cl_O2/libxml2
0.517874 SoK-windows-testsuite/cl_O2/tiffcrop
0.520982 SoK-windows-testsuite/cl_O2/vim
0.522531 SoK-windows-testsuite/cl_Ox/tiffcrop
0.541377 SoK-windows-testsuite/cl_Ox/vim