flare-floss icon indicating copy to clipboard operation
flare-floss copied to clipboard

qs: automate the construction of databases

Open williballenthin opened this issue 1 year ago • 3 comments

via #761 and @r0ny123

For example, to build the expert db, we can use GitHub CI, to automatically add the strings from capa rules whenever a rule with a string is added/updated in the capa rules repo.

williballenthin avatar Jun 12 '23 09:06 williballenthin

for the #common database, this took many hours to build: a dozen hours to fetch the samples from VT, a few hours to extract strings, a few hours to index the results. im not sure this would fit within our GH Actions limits. im also not sure how frequently this data is likely to change, though its certainly worth investigating.

williballenthin avatar Jun 12 '23 09:06 williballenthin

the #expert database is pre-populated with strings from capa rules; however, this was honestly just a shortcut to get something in there. we would like the #expert database to be something that is super easy for users to update and contribute back, such as with a small TUI program or github PR.

i think actually there are many bad entries in the database today from capa, things like "kernel32.dll" etc. So, im hesitate to keep pulling these strings from capa automatically. maybe we can tag update to capa-rules with followup actions to manually update the #expert database when a good string is found?

williballenthin avatar Jun 12 '23 09:06 williballenthin

for the #common database, this took many hours to build: a dozen hours to fetch the samples from VT, a few hours to extract strings, a few hours to index the results. im not sure this would fit within our GH Actions limits. im also not sure how frequently this data is likely to change, though its certainly worth investigating.

We can fetch that info from VT weekly/monthly basis. and regarding the GitHub action limit we can leverage some cloud platforms like AWS etc. Actually, I like the idea how OALabs/hashdb leveraging that. 209026245-1686e6fe-0130-44c7-a04e-4f7d3b77b684

maybe we can tag update to capa-rules with followup actions to manually update the #expert database when a good string is found?

This is a good idea!

r0ny123 avatar Jun 27 '23 16:06 r0ny123