smtp_batch: add feature for grouping and templating
Like discussed in #2586, here is the thing. I already did some manual testing regarding templating the subject string. For this configured
- add
extra.my_template_valueto the events (I did this with the modify bot for now) - add that key (
extra.my_template_value) to the newadditional_grouping_keysparameter - use
subject: "Prefix {{ extra_my_template_value }}" - activate the templating capabilites for the subject line with
templating: {subject: true}
But probably some tests in the test suite will be needed for this (but I'd like to defer this one after the first feedback round)
Any comments?
Maybe issues to discuss:
- how values get hashed generically with
hash_arbitrary - passing the
template_dataalong with eachmailas a tuple - really a warning if jinja2 couldn't be imported? (the user might not want jinja2) -- also whether to add this optional requirement to the
requirements.txt?
~Also I didn't get the testing environment setup correctly until now, so it would be nice if someone could trigger the pipelines yet already so I can check the outputs regarding code-style.~
Feel free to have a look, but it will be some time (estimate: 1-2 weeks) until I can have a look at the failing pipelines. My way to go would be
- looking into what makes them fail
- code style should pass obviously
- failing tests here might indicate a not fully backwards compatible change
- feedback (of course any time, also before/while 1.)
- Write additional tests covering the new features
Finally the tests are working. (Sorry for the frequent pushes)
So now from my point of view only some tests for the new features are missing. I'd like to keep the old/current tests as "basic" config with the "basic" set of features and one new test (different config is needed for this anyhow) for the more advanced features (one run which includes the new grouping and the templating).
The test should be easily fixed now. But I noticed another issue (for which we do not have a test yet:
https://github.com/certtools/intelmq/blob/2f15e20422d350f6ac4cfa641d89712f7f306d7c/intelmq/bots/outputs/smtp_batch/output.py#L77-L78
When source.abuse_contact is multiple addesses it is important to use the address stored in the key and not the soruce.abuse_contact field.
The fix I thought about would be to simply add a number to the redis key serving as index which indicates which address of the source.abuse_contact field should be used in case there are multiple ones.
Any other ideas / comments on this approach?
Ok I noted at least with the default harmonization.yml, source.abuse_contact shouldn't be a list. Still the old version also handled the list case. What do you think should we continue supporting that case? (and if yes what do you think about my current approach?)
Apart from that issue this is finished from my point of view (maybe cleanup the commit history even more). Any comments / changes required?
(sorry I'm late again, trying to get to you in the next days)
Whoa, it took me so long. Thanks for having investigated so much but let's stop supporting list and lets merge!
Whoa, it took me so long. Thanks for having investigated so much but let's stop supporting list and lets merge!
Thank you for the review, I rebased on develop, added changelog entries and will merge after the checks