intelmq icon indicating copy to clipboard operation
intelmq copied to clipboard

TAXII Collector bot and STIX Parser bot

Open laciKE opened this issue 8 months ago • 9 comments

As a bare minimum, TAXII Collector currently collects only the objects of type indicator. These objects contain information about indicators and the detection patterns, e.g. in stix, pcre, sigma, snort, suricata, yara format. The pattern, pattern_type and valid_from properties are required, while confidence, description and labels are only optional properties. However, they are present in several TAXII feeds and could be used to determine classification.taxonomy and classification.type even without processing the relationships of the indicators (e.g. indicator indicates malware)

STIX Parser is currently capable of parsing objects of type indicator (usually retrieved from the TAXII Collector). From the indicator objects, it extracts the detection pattern (currently only the single Observation Expressions in STIX format are supported). It supports IP addresses, Domains and URLs indicator values. Moreover, this parser also attempts to extract some optional properties of STIX objects such as description and labels, which can be useful for futher classification of the event with the Expert Bots

TAXII Collector tests for missing parameters and mock the simple TAXII server providing minimal collection with simple indicator object STIX Parser tests fo indicator patterns parsing Improvements based on @sebix comments, collection title used as feed.code Fix codestyle in TAXII and STIX bots Fix Python 3.8 support in STIX Parser bot.

laciKE avatar Apr 29 '25 23:04 laciKE

The TAXII and STIX bots are currently tested with the ESET Threat Intelligence (ETI) feeds. Recently, ETI added several new feeds which are available only via TAXII/STIX 2.1, and older ESETCollectorBot and ESETParserBot cannot handle them.

I am working on Expert Bot for classification events from ETI and I would like to publish it when it will be ready - together with feeds in feeds.yaml

laciKE avatar Apr 29 '25 23:04 laciKE

Hello, I have a question regarding the proposal from the last commit.

I created ESETExpertBot which can add the proper classification.type and malware.name (if possible) to the events produced by StixParserBot. Ref: https://github.com/laciKE/intelmq/blob/eset/intelmq/bots/experts/eset/expert.py

When I wanted to add ESET Threat Intelligence TAXII feeds to feeds.yaml also with the expert bot, too, the tests failed, because it seems that the expert bot is not allowed in feeds.yaml.

Especially with the TAXII feeds, three bots will be needed to ingest those feeds:

  • Collect STIX objects from TAXII server (generic TAXII Collector)
  • Parse generic STIX indicator objects (generic STIX Parser)
  • Apply vendor-specific enrichment of events based on optional STIX properties used by the particular vendor (vendor-specific Expert bot).

As far as I understand, two parsers cannot by chained in the pipeline (because the input is Report, and output is the Event). What is the suggested way to do a three-step ingestion in similar cases? One generic Parser bot for given format, and all vendor-specific bots should inherit from that generic parser bot?

laciKE avatar May 01 '25 23:05 laciKE

From what I understand, reading the code, the ESET expert fixes the classification for all events coming from the ESET feed. That logic should be in the Parser instead. Or is the code of ESET expert also useful for other sources other than ESET?

sebix avatar May 02 '25 09:05 sebix

Thank you for your answer. You are right, that expert bot fixes the classification and it is ESET-specific. I will change it to parser bot, which will inherit from the StixParserBot from this pull request. After that, I will add the commits with "EsetStixParserBot" to this pull request.

laciKE avatar May 02 '25 19:05 laciKE

Ah, I see. That parser also works for multiple sources, other than ESET?

sebix avatar May 02 '25 19:05 sebix

This StixParserBot yes, it should work for any source which provide Threat Intelligence data in STIX 2.1 format. I created it from scratch by reading STIX 2.1 documentation, and it is able to parse Indicators Objects with simple Patterns.

StixParserBot (and TaxiiCollectorBot) should be used with any TAXII/STIX 2.1 feed. General parsing of indicators works, but for correct classification, the vendor-specific bot is needed. This is why I asked what is the proper way to do it.

Currently I tested TaxiiCollectorBot+StixParserBot only with ESET Threat Intelligence TAXII feeds, because I do not have access to other TAXII 2.1 feeds. For correct classification, I created the ESETExpertBot, which I am going to change to ESETStixParserBot (it will by child a class of generic StixParserBot)

laciKE avatar May 02 '25 21:05 laciKE

I will try to do better parsing for STIX2 patterns.

Also, in ESET Threat Intelligence there are sometimes domains reported in URL feed and IP addresses in Domain feed, and this causes InvalidValue exceptions in produced events - I will try to address it, at least by discarding those indicators without raising exceptions (raise_failure=False).

laciKE avatar May 23 '25 21:05 laciKE

Better parsing for STIX2 patterns ready, now the STIX parser bot can extract also hashes.

Above-mentioned issues with ESET Threat Intelligence fixed.

From my side, the PR is ready for review. If I should change something or if I forgot to do something, please, let me know, this is my first PR to IntelMQ.

laciKE avatar May 29 '25 00:05 laciKE

I rebased on current develop and added the changelog entry. Review to come soon :)

sebix avatar Jul 18 '25 11:07 sebix