magika icon indicating copy to clipboard operation
magika copied to clipboard

Hello from the CCCS! 🍁

Open cccs-kevin opened this issue 1 year ago • 4 comments

We at the Assemblyline project perform our own file identification to ensure files are routed correctly to the corresponding file analysis modules. That is why the magika project is very interesting to us.

We have a set of files used for unit testing that we are confident* in their file type. We ran that set against the magika tool and found some discrepancies: see attached CSV.

All of the SHA256 hashes can be found on VirusTotal, and we would love to collaborate (join our Discord!) to improve magika to the point where we can integrate it into Assemblyline :)

AL_MAGIKA_COMP_revised.csv

Cheers, 🇨🇦

cccs-kevin avatar Feb 16 '24 19:02 cccs-kevin

Much appreciated! We can add these to our golden dataset and track improvements to the model. Can I assume these are MIT licensed like the rest of Assemblyline?

Btw - I love Assemblyline :)

invernizzi avatar Feb 19 '24 18:02 invernizzi

Indeed, thank you for taking the time! This is extremely useful. We need to settle down on a bunch of things after this initial release, and we'll then definitively follow up :-) Thanks again!

reyammer avatar Feb 19 '24 18:02 reyammer

For sure! If there's anything we can do to help improve the project, feel free to let us know!

cccs-rs avatar Feb 19 '24 18:02 cccs-rs

Much appreciated! We can add these to our golden dataset and track improvements to the model. Can I assume these are MIT licensed like the rest of Assemblyline?

Btw - I love Assemblyline :)

Assemblyline itself is MIT-licensed but the hashes in that list should not be assumed to be MIT. They are not owned by Assemblyline and rather are just files found on VirusTotal :)

cccs-kevin avatar Feb 19 '24 18:02 cccs-kevin