misp-book
misp-book copied to clipboard
Document correlation engine
The misp-book refers to the "correlation engine". For example, in using-the-system/README.md:
Value: The value or value-pair of the attribute. This is the main payload of the attribute, which is described by the category and type columns. For certain types of attributes that are made up of value-pairs the two parts will be split by a pipe (|), such as for filename|md5. The value field(s) are used by the correlation engine to find relations between events. In value-pair attributes both values are correlated individually.
Is the correlation feature an independent part of the MISP system? I noticed that some PHP code directly manipulates the correlations; e.g.,
https://github.com/MISP/MISP/blob/45cfc81de05e95e08d88ee7771258f5cceff8319/app/Model/Attribute.php#L480
https://github.com/MISP/MISP/blob/45cfc81de05e95e08d88ee7771258f5cceff8319/app/Model/Event.php#L522
Anyhow, whether it's a separate engine or not, please describe how the MISP system maintains its correlations table in normal operation:
- how correlations are maintained
- when the engine runs and/or what kicks it off (if there's an engine)
- when new correlations are added after an attribute or event is added; e.g., what's the interval
- when orphan correlations are dropped after an attribute or event is deleted; again what's the interval
- how to debug what the correlation feature is doing; e.g., whether it writes any log file entries
To quickly reply to your questions.
The correlation engine is currently more a concept than an actual implementation (though, we want to make it possible to move it out of the core and use another system).
All operations related to the correlation are done via the built-in cakephp's callback methods (https://book.cakephp.org/2/en/models/callback-methods.html) where we call functions such as __beforeSaveCorrelation
and __afterSaveCorrelation
to take care of creating/removing/updating the entries in the correlation table.
Currently, there is no logging in place and IMHO, it's best that it stays as is. The correlation process already induces a significant overhead during any CRUD operation and I don't think the trade off between logging its activities and the additional overhead it would create is worth it. But it's up for debate obviously!
Thanks Sami @mokaddem. If you get a chance would you please comment here also: https://github.com/MISP/MISP/issues/7305 ?