ATTACK-Python-Client icon indicating copy to clipboard operation
ATTACK-Python-Client copied to clipboard

Missing groups in all_groups() when group is "living" in multiple matrices

Open rubinatorz opened this issue 1 year ago • 1 comments

hi Roberto!

Enterprise, ICS and Mobile matrices are all included in the STIX objects. Some threat actor groups are active in multiple matrices, for example Windshift is active in both Enterprise and Mobile. Within the STIX objects this group is included with ID "intrusion-set--afec6dc3-a18e-4b62-b1a4-5510e1a498d1" in Enterprise as also in Mobile. But what happens when using get_groups() of attackcti, you will only get 1. The one from Mobile ("x_mitre_domains": ["mobile-attack"]). The following code will demonstrate this:

from attackcti import attack_client

client = attack_client(local_path="../cti/")
groups = client.get_groups()

windshift = "intrusion-set--afec6dc3-a18e-4b62-b1a4-5510e1a498d1"

for group in groups:
    if group['id'] == windshift:
        print(group)

OUTPUT (compressed):
{"type": "intrusion-set", "id": "intrusion-set--afec6dc3-a18e-4b62-b1a4-5510e1a498d1", 
"modified": "2021-04-26T14:37:33.234Z", "name": "Windshift", ...................., 
"x_mitre_domains": ["mobile-attack"], ....................}

I digged into this problem, because when getting all groups I'm now missing the one from Enterprise. I also have seen other items where groups are Enterprise and ICS for example and then only the ICS one is returned. And also ones where a PRE group is returned (see also my issue from earlier today #59).

It turned out that the query function of CompositeDataSource deduplicate items:

# remove exact duplicates (where duplicates are STIX 2.0
# objects with the same 'id' and 'modified' values)
if len(all_data) > 0:
    all_data = deduplicate(all_data)

It deduplicates based on the ID and modified date. The modified date is in many cases also the same, so I'm missing groups from other matrices now. I solved this by using the query function for each matrix separately and merge the results:

groups_enterprise = mitre.TC_ENTERPRISE_SOURCE.query(Filter("type", "=", "intrusion-set"))
groups_ics = mitre.TC_ICS_SOURCE.query(Filter("type", "=", "intrusion-set"))
groups_mobile = mitre.TC_MOBILE_SOURCE.query(Filter("type", "=", "intrusion-set"))
all_groups = groups_enterprise + groups_ics + groups_mobile

I think it's good to take this into account for attackcti. Especially for the all_groups function. And maybe it also applies to other data objects.

rubinatorz avatar Sep 16 '22 09:09 rubinatorz

Please note that this behaviour seems to be only with the given local STIX path. When using the TAXII server, it gives also just 1 result, but containing the "x_mitre_domains": ["enterprise-attack"] instead.

So the question is... is it necessary to have get_groups() return multiple items for the same actor when it's part of multiple matrices. Or does x_mitre_domains need to contain the right information...

Also interessting: when calling statement underneath with local STIX data you'll get 5 items and all with x_mitre_domains=['mobile-attack']. But with TAXII data you'll get 4 items with x_mitre_domains=['enterprise-attack'] and 1 item with x_mitre_domains=['mobile-attack'].

groups_mobile = mitre.TC_MOBILE_SOURCE.query(Filter("type", "=", "intrusion-set"))

I think TAXII server is also doing "something" with deduplicating, but in another way...

rubinatorz avatar Sep 16 '22 12:09 rubinatorz

very interesting. I have not looked into that before and I believe it is TAXII doing some dedup on their end to be honest based on your examples. mmm. I also believe that the x_mire_domains needs to contain the right information.

Cyb3rWard0g avatar Oct 25 '22 17:10 Cyb3rWard0g