ThreatIngestor
ThreatIngestor copied to clipboard
Parse images with OCR for further IOC extraction.
Consider the following Tweets:
- https://twitter.com/alphasoc/status/1119360843567681536
- https://twitter.com/alphasoc/status/1118254832714797056
Which contain the following image URLs:
- https://pbs.twimg.com/media/D4TU-QiUwAA0Lc8.jpg
- https://pbs.twimg.com/media/D4jEsmqU8AAMeTp.jpg:large
Retrieve the image, run through a cloud OCR (Google, Facebook, AWS), then parse with IOCExtract for inclusion in the IOC stream.
This would be a perfect candidate for a new queue worker - you could probably do it with just a few lines difference from the paste processor.
Here's a more modern example of why this is valuable:
https://www.sentinelone.com/blog/top-10-macos-malware-discoveries-in-2022/
In the next version of ThreatIngestor, this can be accomplished with a new source specifically for image extraction. This does require some /tmp
data to live on the system due to how CV handles the binary data from images, but it should work for both local and external images.
config.yml
sources:
- name: image-scrape
module: image
img: local.jpg
- name: image-scrape
module: image
img: https://user-images.githubusercontent.com/1253573/210873147-a8fdbc59-2bbf-4c56-af6d-d01503aabb93.png
Command
threatingestor config.yml