intelmq icon indicating copy to clipboard operation
intelmq copied to clipboard

Added support for JSON containing multiple events

Open tim427 opened this issue 1 year ago • 1 comments

Currently the intelmq.bots.parsers.json.parser is only able to parse or single events in JSON, or multiple events in JSON, each on their own line.

This PR contains an option to parse multiple events within a JSON, by adding the multiple_events (boolean) to the config.

tim427 avatar Dec 11 '24 13:12 tim427

Do you have an example feed at hand (so we can extract an example for the tests, add it to the docs)? To my knowledge no documented feed is using such a format.

Our National Cyber Security Centre (NCSC) is sending us "IntelMQ JSON's" in a ZIP-file by mail. The ZIP-file contains a single JSON-file.

Here's an example (I tried to anonymise most values):

[
    {
        "extra.dataset_collections": "0",
        "extra.dataset_files": "1",
        "extra.dataset_infected": "false",
        "extra.dataset_ransom": "null",
        "extra.dataset_rows": "0",
        "extra.dataset_size": "301",
        "protocol.application": "https",
        "protocol.transport": "tcp",
        "source.asn": 12345689,
        "source.fqdn": "fqdn-example-1.tld",
        "source.geolocation.cc": "NL",
        "source.geolocation.city": "Enschede",
        "source.geolocation.latitude": 52.0000000000000,
        "source.geolocation.longitude": 6.0000000000000,
        "source.geolocation.region": "Overijssel",
        "source.ip": "127.1.2.1",
        "source.network": "127.1.0.0/16",
        "source.port": 80,
        "time.source": "2024-12-16T02:08:06+00:00"
    },
    {
        "extra.dataset_collections": "0",
        "extra.dataset_files": "1",
        "extra.dataset_infected": "false",
        "extra.dataset_ransom": "null",
        "extra.dataset_rows": "0",
        "extra.dataset_size": "615",
        "extra.os_name": "Ubuntu",
        "extra.software": "Apache",
        "extra.tag": "rescan",
        "extra.version": "2.4.58",
        "protocol.application": "https",
        "protocol.transport": "tcp",
        "source.asn": 12345689,
        "source.fqdn": "fqdn-example-2.tld",
        "source.geolocation.cc": "NL",
        "source.geolocation.city": "Eindhoven",
        "source.geolocation.latitude": 51.0000000000000,
        "source.geolocation.longitude": 5.0000000000000,
        "source.geolocation.region": "North Brabant",
        "source.ip": "127.1.2.2",
        "source.network": "127.1.0.0/16",
        "source.port": 443,
        "time.source": "2024-12-16T02:08:12+00:00"
    },
    {
        "extra.dataset_collections": "0",
        "extra.dataset_files": "1",
        "extra.dataset_infected": "false",
        "extra.dataset_ransom": "null",
        "extra.dataset_rows": "0",
        "extra.dataset_size": "421",
        "protocol.application": "http",
        "protocol.transport": "tcp",
        "source.asn": 12345689,
        "source.geolocation.cc": "NL",
        "source.geolocation.city": "Enschede",
        "source.geolocation.latitude": 52.0000000000000,
        "source.geolocation.longitude": 6.0000000000000,
        "source.geolocation.region": "Overijssel",
        "source.ip": "127.1.2.3",
        "source.network": "127.1.0/16",
        "source.port": 9000,
        "time.source": "2024-12-15T21:09:49+00:00"
    }
]

tim427 avatar Dec 16 '24 11:12 tim427

I added a few changes here:

  • A test cases
    • that's where I noticed that the json parser didn't add the required classification.type field if it doesn't exist in input data, so added that as well
  • The optimizations as discussed above
    • Which also revealed another bug in Message.from_dict which modified the parameter
  • add documentation

sebix avatar Aug 14 '25 08:08 sebix

...and found & fixed another bug in intelmq.lib.message.Message.from_dict: Raise a ValueError if message type is not determinable

sebix avatar Aug 14 '25 08:08 sebix

As I wrote a major part of this PR, I won't merge it myself

@aaronkaplan could you do the review instead?

sebix avatar Aug 14 '25 09:08 sebix

Rebased on develop to fix conflicts

sebix avatar Aug 25 '25 18:08 sebix

@kamil-certat maybe you can have a look?

sebix avatar Aug 29 '25 11:08 sebix