PyMISP icon indicating copy to clipboard operation
PyMISP copied to clipboard

PyMISP tags() method slow response on MISP ver 2.4.131 with 3k tags

Open chrisinmtown opened this issue 5 years ago • 6 comments

The pymisp doc for the tags() method says simply "Get the list of existing tags". I expected this method to be implemented as roughly "select * from misp.tags;" at the database and return answers quickly. Instead, our version 2.4.131 system requires about 100 seconds to return a list of about 3,300 tags, using the default pythonify=False option.

I checked the database where I see queries like this running:

SELECT COUNT(*) AS `count` FROM `misp`.`attributes` AS `Attribute` 
LEFT JOIN `misp`.`events` AS `Event` ON (`Attribute`.`event_id` = `Event`.`id`) 
LEFT JOIN `misp`.`sharing_groups` AS `SharingGroup` ON (`Attribute`.`sharing_group_id` = `SharingGroup`.`id`) 
LEFT JOIN `misp`.`objects` AS `Object` ON (`Attribute`.`object_id` = `Object`.`id`)  
WHERE `Attribute`.`id` IN (39717, 39730, 39751, ..

Here's a sample Tag from the response list. Interestingly, the database column 'numerical_value' is not returned, but other fields are returned. I cannot find any explanation of fields "count" and "attribute_count" but I bet those are the usages of this tag:

{"id": "1221", "name": "malware_classification:malware-category=\"Botnet\"", "colour": "#22681c", "exportable": true,
  "org_id": "0", "hide_tag": false, "user_id": "0", "count": "0", "attribute_count": "0", "favourite": false}

I think the root cause of this issue is probably documentation, not software. If I'm right in guessing that the tags() method's results contain stats for every tag about its usage in events and attributes, then I'll be glad to submit a PR with an extended docstring for the tags() method.

chrisinmtown avatar Oct 30 '20 19:10 chrisinmtown

Alternately, would you accept a feature request to make the behavior of the tags() method configurable? For example, add parameter usage; if True (default) then do the massive join and return the count/attribute fields, otherwise do not.

chrisinmtown avatar Oct 30 '20 19:10 chrisinmtown

@mokaddem @iglocska any idea what do do there? Maybe allow to paginate?

Rafiot avatar Nov 02 '20 11:11 Rafiot

I suggest applying the following patch to explain the tags() method response.

diff --git a/pymisp/api.py b/pymisp/api.py
index 05c3b25..24bbb16 100644
--- a/pymisp/api.py
+++ b/pymisp/api.py
@@ -787,7 +787,9 @@ class PyMISP:
     # ## BEGIN Tags ###
 
     def tags(self, pythonify: bool = False) -> Union[Dict, List[MISPTag]]:
-        """Get the list of existing tags.
+        """Get the list of existing tags and their usage.
+        In the response, each tag has key 'count' with the number of tagged events
+        and key 'attribute_count' with the number of tagged attributes.
 
         :param pythonify: Returns a list of PyMISP Objects instead of the plain json output. Warning: it might use a lot of RAM
         """

chrisinmtown avatar Nov 02 '20 13:11 chrisinmtown

Done for the docstring: https://github.com/MISP/PyMISP/commit/7e84c36406905e8f976571cf0dcf2f3c3e9b5fc3

Thanks!

Rafiot avatar Nov 02 '20 13:11 Rafiot

The improved docstring addresses my question here, thank you, feel free to close this issue. Alternately I could rewrite this issue as a feature request: tags() should accept pagination parameters.

chrisinmtown avatar Nov 02 '20 13:11 chrisinmtown

As it doesn't resolves the performance problem, and that we do not have a clean solution for that now, we will keep it open.

Rafiot avatar Nov 02 '20 13:11 Rafiot