ThreatIngestor Parse images with OCR for further IOC extraction.

Parse images with OCR for further IOC extraction.

Open pedramamini opened this issue 5 years ago • 1 comments

Consider the following Tweets:

https://twitter.com/alphasoc/status/1119360843567681536
https://twitter.com/alphasoc/status/1118254832714797056

Which contain the following image URLs:

https://pbs.twimg.com/media/D4TU-QiUwAA0Lc8.jpg
https://pbs.twimg.com/media/D4jEsmqU8AAMeTp.jpg:large

Retrieve the image, run through a cloud OCR (Google, Facebook, AWS), then parse with IOCExtract for inclusion in the IOC stream.

May 20 '19 16:05 pedramamini

This would be a perfect candidate for a new queue worker - you could probably do it with just a few lines difference from the paste processor.

May 20 '19 18:05 rshipp

Here's a more modern example of why this is valuable:

https://www.sentinelone.com/blog/top-10-macos-malware-discoveries-in-2022/

Jan 05 '23 20:01 pedramamini

In the next version of ThreatIngestor, this can be accomplished with a new source specifically for image extraction. This does require some /tmp data to live on the system due to how CV handles the binary data from images, but it should work for both local and external images.

config.yml

sources:
  - name: image-scrape
    module: image
    img: local.jpg
  
  - name: image-scrape
    module: image
    img: https://user-images.githubusercontent.com/1253573/210873147-a8fdbc59-2bbf-4c56-af6d-d01503aabb93.png

Command

threatingestor config.yml

Jan 24 '23 23:01 battleoverflow

ThreatIngestor ThreatIngestor copied to clipboard

Parse images with OCR for further IOC extraction.

config.yml

Command

ThreatIngestor
ThreatIngestor copied to clipboard