Added support for JSON containing multiple events
Currently the intelmq.bots.parsers.json.parser is only able to parse or single events in JSON, or multiple events in JSON, each on their own line.
This PR contains an option to parse multiple events within a JSON, by adding the multiple_events (boolean) to the config.
Do you have an example feed at hand (so we can extract an example for the tests, add it to the docs)? To my knowledge no documented feed is using such a format.
Our National Cyber Security Centre (NCSC) is sending us "IntelMQ JSON's" in a ZIP-file by mail. The ZIP-file contains a single JSON-file.
Here's an example (I tried to anonymise most values):
[
{
"extra.dataset_collections": "0",
"extra.dataset_files": "1",
"extra.dataset_infected": "false",
"extra.dataset_ransom": "null",
"extra.dataset_rows": "0",
"extra.dataset_size": "301",
"protocol.application": "https",
"protocol.transport": "tcp",
"source.asn": 12345689,
"source.fqdn": "fqdn-example-1.tld",
"source.geolocation.cc": "NL",
"source.geolocation.city": "Enschede",
"source.geolocation.latitude": 52.0000000000000,
"source.geolocation.longitude": 6.0000000000000,
"source.geolocation.region": "Overijssel",
"source.ip": "127.1.2.1",
"source.network": "127.1.0.0/16",
"source.port": 80,
"time.source": "2024-12-16T02:08:06+00:00"
},
{
"extra.dataset_collections": "0",
"extra.dataset_files": "1",
"extra.dataset_infected": "false",
"extra.dataset_ransom": "null",
"extra.dataset_rows": "0",
"extra.dataset_size": "615",
"extra.os_name": "Ubuntu",
"extra.software": "Apache",
"extra.tag": "rescan",
"extra.version": "2.4.58",
"protocol.application": "https",
"protocol.transport": "tcp",
"source.asn": 12345689,
"source.fqdn": "fqdn-example-2.tld",
"source.geolocation.cc": "NL",
"source.geolocation.city": "Eindhoven",
"source.geolocation.latitude": 51.0000000000000,
"source.geolocation.longitude": 5.0000000000000,
"source.geolocation.region": "North Brabant",
"source.ip": "127.1.2.2",
"source.network": "127.1.0.0/16",
"source.port": 443,
"time.source": "2024-12-16T02:08:12+00:00"
},
{
"extra.dataset_collections": "0",
"extra.dataset_files": "1",
"extra.dataset_infected": "false",
"extra.dataset_ransom": "null",
"extra.dataset_rows": "0",
"extra.dataset_size": "421",
"protocol.application": "http",
"protocol.transport": "tcp",
"source.asn": 12345689,
"source.geolocation.cc": "NL",
"source.geolocation.city": "Enschede",
"source.geolocation.latitude": 52.0000000000000,
"source.geolocation.longitude": 6.0000000000000,
"source.geolocation.region": "Overijssel",
"source.ip": "127.1.2.3",
"source.network": "127.1.0/16",
"source.port": 9000,
"time.source": "2024-12-15T21:09:49+00:00"
}
]
I added a few changes here:
- A test cases
- that's where I noticed that the json parser didn't add the required
classification.typefield if it doesn't exist in input data, so added that as well
- that's where I noticed that the json parser didn't add the required
- The optimizations as discussed above
- Which also revealed another bug in
Message.from_dictwhich modified the parameter
- Which also revealed another bug in
- add documentation
...and found & fixed another bug in intelmq.lib.message.Message.from_dict: Raise a ValueError if message type is not determinable
As I wrote a major part of this PR, I won't merge it myself
@aaronkaplan could you do the review instead?
Rebased on develop to fix conflicts
@kamil-certat maybe you can have a look?