flare-floss
flare-floss copied to clipboard
qs: automate the construction of databases
via #761 and @r0ny123
For example, to build the expert db, we can use GitHub CI, to automatically add the strings from capa rules whenever a rule with a string is added/updated in the capa rules repo.
for the #common database, this took many hours to build: a dozen hours to fetch the samples from VT, a few hours to extract strings, a few hours to index the results. im not sure this would fit within our GH Actions limits. im also not sure how frequently this data is likely to change, though its certainly worth investigating.
the #expert database is pre-populated with strings from capa rules; however, this was honestly just a shortcut to get something in there. we would like the #expert database to be something that is super easy for users to update and contribute back, such as with a small TUI program or github PR.
i think actually there are many bad entries in the database today from capa, things like "kernel32.dll" etc. So, im hesitate to keep pulling these strings from capa automatically. maybe we can tag update to capa-rules with followup actions to manually update the #expert database when a good string is found?
for the #common database, this took many hours to build: a dozen hours to fetch the samples from VT, a few hours to extract strings, a few hours to index the results. im not sure this would fit within our GH Actions limits. im also not sure how frequently this data is likely to change, though its certainly worth investigating.
We can fetch that info from VT weekly/monthly basis. and regarding the GitHub action limit we can leverage some cloud platforms like AWS etc. Actually, I like the idea how OALabs/hashdb leveraging that.
maybe we can tag update to capa-rules with followup actions to manually update the #expert database when a good string is found?
This is a good idea!