capa flirt: consider using rizin/sigdb signatures

https://github.com/rizinorg/sigdb

Jul 06 '22 20:07 williballenthin

https://github.com/rizinorg/sigdb-source

Jul 06 '22 20:07 williballenthin

We can also use this for floss.

Jul 08 '22 20:07 r0ny123

@williballenthin , I have looked into adding .sig files. The paths to sig files are being loaded by viv_utils.flirt.load_flirt_signature we can make sigs folder a submodule linked to https://github.com/rizinorg/sigdb . We can add path to sigs folder as default while running capa if not specified by user.

Jun 12 '23 11:06 Aayush-Goel-04

that sounds good.

prior to doing this, would you investigate the coverage provided by the rizin sig files versus what we currently distribute? you can match FLIRT signatures against a collection of PE files, perhaps around 100, and see how often each symbol matches. then, we should be able to decide if we want to commit to rizin.

Jun 12 '23 11:06 williballenthin

see how often each symbol matches.

Do you mean the identifying how often library functions matches (Ex. strcpy) ? I think the script to check the coverage of functions can be generated by making some modifications in scripts/match-function-id.py? As u said we can pick some 100 random PEs from tests/data and find number matches for each symbol and compare stats with current sigs? thoughts @williballenthin

Jun 12 '23 19:06 Aayush-Goel-04

yes, exactly. you can use that script to see which functions are matched given a signature file.

i'm optimistic that both sets cover about the same functions, but i'm really not sure.

feel free to present the data in any way that makes sense to you, showing the trade offs between the two signature sets.

Jun 12 '23 19:06 williballenthin

i can also provide a collection of random files or you can use the files in capa-testfiles. the second idea might be a bit easier since they're already available.

Jun 12 '23 19:06 williballenthin

will go ahead with capa-testfiles. @williballenthin If possible, wouldn't it be better to directly check which source of sigs covers more symbols.

Jun 12 '23 19:06 Aayush-Goel-04

@williballenthin Screenshot 2023-06-13 at 6 02 40 PM In above screenshot, there are three terminals above each terminal is txt file generated using the match_function_id.py script with some modifications. The txt file contains functions found and how many times they were found. In terminal there is also order in which sigs are being compiled. Currently I am using capa/sigs files. In left and right cases sigs compiling order is reversed and in both the cases function found in line 27 is different _exit and _Curl_hash_clean. Is it because same function has different names in in different sigs. Middle terminal shows results when running only using 3_flare_common_libs.sig. Sample being used is PMA 12-02.exe.

So, Should this be later treated as a single function or coverage as 2 different symbols. Also in the first line is symbol '?' a valid function ?

Jun 13 '23 12:06 Aayush-Goel-04

what im most interested in is if the rizin database matches approximately the same number (or more) functions than the siglib databases. its reassuring to see about the same names in the results above! i don't think its important to investigate every difference - just the approximate total counts and coverage.

i think "?" is possibly used to indicate there were matches but its not clear which one (ambiguous match).

i dont quite understand your question, though. did the above explanation help? or if you need a different response, can you rephrase the question?

Jun 13 '23 18:06 williballenthin

Matches for functions in PMA files are low if rizin/sigdb files are used, but rizin/sigdb shows more matches for files other than PMA ones Should I include PMA files in my tests?

Jun 13 '23 18:06 Aayush-Goel-04

yeah please include all the testfiles, if possible

Jun 13 '23 18:06 williballenthin

@williballenthin running python3 scripts/match-function-id.py tests/data/6cc148363200798a12091b97a17181a1.exe_ --signature sigs/1_flare_msvc_rtf_32_64.sig gives error at function 0x1401d4e60: 'NoneType' object is not subscriptable. Can you think of whats causing error here?

Traceback (most recent call last):
  File "/Users/ayush.goel/Documents/GitHub/capa/scripts/match-function-id.py", line 134, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/ayush.goel/Documents/GitHub/capa/scripts/match-function-id.py", line 126, in main
    name = viv_utils.flirt.match_function_flirt_signatures(analyzer.matcher, vw, function)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/viv_utils/flirt.py", line 188, in match_function_flirt_signatures
    loc_va = vw.getLocation(ref_va)[vivisect.const.L_VA]
             ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
TypeError: 'NoneType' object is not subscriptable

Jun 13 '23 20:06 Aayush-Goel-04

looks like getLocation is returning None when we expect it to be a tuple. this needs a fix in viv-utils. you're welcomed to propose this or i can do it tomorrow. in the meantime, maybe update the script to catch such an exception?

Jun 13 '23 20:06 williballenthin

looks like getLocation is returning None when we expect it to be a tuple. this needs a fix in viv-utils.

Done in PR https://github.com/williballenthin/viv-utils/pull/116

Jun 13 '23 21:06 Aayush-Goel-04

In case of using capa/sigs a total 8282 unique symbols were found in all PEs.
In case of using rizin/sigdb a total 5781 unique symbols were found in all PEs.
In all symbol matches found in all PEs using capa and rizin sigs only 1099 functions had same names. Below txt files conatins unique functions found using capa and rizin sigs. allFounds_capa_sigs.txt allFounds_rizin_sigs.txt
For all 219 .exe_ files in capa-testfiles processed Rizin gave more matches for 101 PEs as compared to capa. For rest 118 PEs capa gave better results.

Below graph shows results of number unique function matches on all files using capa and rizin sigs.

Below image shows difference in matches for all files.

The below .xls contains results for above graphs

allMatches.xls

Jun 14 '23 07:06 Aayush-Goel-04

@williballenthin Based on above results it would be better to stick to current Capa sigs. What are your thoughts !

Jun 20 '23 06:06 Aayush-Goel-04

Interesting results. Do you have insight into which files rizin handles better than the capa signatures? Maybe we can leverage a subset of the rizin rules?

Jun 20 '23 06:06 mr-tz

In the excel file attached above it mentions which files rizin handles better. Currently, I don't know a way to classify the PE files used for testing. Do you have any suggestions of how I can classify files ?

Jun 20 '23 08:06 Aayush-Goel-04

detect it easy could shed some light on compiler/linker versions

Jun 20 '23 08:06 mr-tz

Sharing compiler and linkers results found using Detect it easy. In below excel empty cells means no results from DIEC. updatedMatches.xlsx

In first look, I wasn't able to detect any patterns in files which rizin handles better, will continue looking into it. @mr-tz @williballenthin Could you share your views if u found any patterns based on results in above excel file.

Jun 25 '23 15:06 Aayush-Goel-04

Cool, thanks for the research here. Seems like rizin does slightly better on

Microsoft Linker(14.11, Visual Studio 2017 15.3*)[Console32,console]
Microsoft Linker(14.0, Visual Studio 2015 14.0*)[Console64,console]
Microsoft Linker(14.0, Visual Studio 2015 14.0*)[GUI32]
Microsoft Linker(14.26, Visual Studio 2019 16.6*)[Console32,console]
Microsoft Linker(14.0, Visual Studio 2015 14.0*)[GUI64]
Microsoft Linker(14.0, Visual Studio 2015 14.0*)[Console32,console]
Microsoft Linker(14.32, Visual Studio 2022 17.2*)[GUI64]

But overall, our signatures seem to do pretty well and I don't see a reason to change based on this.

Jun 27 '23 07:06 mr-tz

@mr-tz, thanks for reviewing the results. Let me know if you have any further thoughts or if there's anything else regarding this.

Jun 27 '23 12:06 Aayush-Goel-04

@williballenthin, what do you think of these results, can we close this issue (for now)?

Jun 28 '23 06:06 mr-tz

yeah, i agree, lets keep using the signatures that we have; however, if they become noticably out of data and/or rizin introduces other relevant signatures, lets consider if it become worthwhile to switch over.

@Aayush-Goel-04 thank you very much for taking the time to do this data exploration. although it didn't lead to a merged PR, this is a better outcome - no additional work. i appreciated the way you collected data and presented the results, sharing raw information when we asked for it. 🙇🏼‍♂️

Jun 28 '23 10:06 williballenthin

capa capa copied to clipboard

flirt: consider using rizin/sigdb signatures

capa
capa copied to clipboard