intelmq icon indicating copy to clipboard operation
intelmq copied to clipboard

Add new bot: time filter

Open mariuskarotkis opened this issue 3 years ago • 9 comments

Add new bot: time filter

mariuskarotkis avatar May 26 '21 11:05 mariuskarotkis

We already have the "filter" expert which can filter on (non-)equality, non-existence and regular expressions (match and non-match). We also have the sieve expert with filtering capabilities on all of the above plus existence, substrings, numeric comparisons, network ranges, set/list operations and all with typing support.

I'm reflecting what is the best user experience

  1. having a dedicated bot for filtering only on time (con: causes the bot-list to grow with yet-another filter expert), or
  2. expanding the existing filter expert to time-filtering (see also #569), (con: makes the configuration of the bot more complex), and/or
  3. expanding the sieve expert to time filtering

1 and 2 contradict each other, while 3 is orthogonal. I'm currently tending towards 1 (which is, what this PR does), as 2 means a more complex configuration of the bot - but I want to sleep over it first and hear others' opinions. And 3, our swiss-army knife, would be cool as well =)

ghost avatar Jun 07 '21 20:06 ghost

I not sure yet what the purpose of this bot is due to missing docs, however I want to point out named queued ("paths") in IntelMQ. Using them could extend the functionality of the bot, by keeping the configuration simple. Thay are used in the "normal" filter expert like this: https://intelmq.readthedocs.io/en/latest/user/bots.html#filter (section Possible paths) and https://github.com/certtools/intelmq/blob/develop/intelmq/bots/experts/filter/expert.py#L105-L143

For example, with a single parameter (timespan), multiple use-cases could be solved, all used in parallel by the IntelMQ user

  • path older_than receives all data matching event < now()-timespan
  • path newer_than receives all data matching event > now()-timespan
  • path older_than_future receives all data matching event < now()+timespan
  • path newer_than_future receives all data matching event > now()+timespan

or similar.

ghost avatar Jun 14 '21 12:06 ghost

@mariuskarotkis The tests (or other Actions) of all your remaining PRs are failing currently. Could you please have a look at them? Please also apply the lessons learnt from the review of the merged PRs - if applicable to the new ones. If a PR is ready for review from your side, please give us a short ping.

ghost avatar Aug 20 '21 06:08 ghost

Please check tests.

mariuskarotkis avatar Aug 20 '21 07:08 mariuskarotkis

The pycodestyle check fails with

intelmq/lib/utils.py:925:13: E701 multiple statements on one line (colon)

For the nosetests, you introduced a new testing requirement freeze_time. But that does not exist: https://pypi.org/project/freeze_time/ and causes the check to fail. Maybe you meant freezegun? However, we already use time-machine is a similar test case: https://github.com/certtools/intelmq/blob/develop/intelmq/tests/bots/experts/aggregate/test_expert.py We evaluated various libraries for simulating the time and that was the best matching result. It would also support decorators: https://github.com/adamchainz/time-machine#readme I'd prefer if we can stick to one time travelling library to keep the maintenance low - if possible.

ghost avatar Aug 20 '21 13:08 ghost

Codecov Report

Merging #1969 (8c72554) into develop (1a7a339) will decrease coverage by 0.57%. The diff coverage is 86.66%.

:exclamation: Current head 8c72554 differs from pull request most recent head e7427bd. Consider uploading reports for the commit e7427bd to get more accurate results

:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff             @@
##           develop    #1969      +/-   ##
===========================================
- Coverage    76.91%   76.35%   -0.57%     
===========================================
  Files          454      456       +2     
  Lines        24120    24052      -68     
  Branches      3516     3793     +277     
===========================================
- Hits         18553    18366     -187     
- Misses        4793     4938     +145     
+ Partials       774      748      -26     
Impacted Files Coverage Δ
intelmq/bots/experts/time_filter/expert.py 79.41% <79.41%> (ø)
...elmq/tests/bots/experts/time_filter/test_expert.py 92.50% <92.50%> (ø)
intelmq/lib/utils.py 72.36% <100.00%> (-0.78%) :arrow_down:
intelmq/bin/intelmqdump.py 18.30% <0.00%> (-24.56%) :arrow_down:
intelmq/bin/intelmqctl.py 9.18% <0.00%> (-4.34%) :arrow_down:
intelmq/lib/processmanager.py 23.99% <0.00%> (-3.82%) :arrow_down:
intelmq/bots/experts/reverse_dns/expert.py 83.87% <0.00%> (-0.26%) :arrow_down:
intelmq/tests/lib/test_utils.py 99.48% <0.00%> (-0.04%) :arrow_down:
intelmq/tests/bin/test_intelmqdump.py 100.00% <0.00%> (ø)
... and 5 more

codecov-commenter avatar Aug 20 '21 13:08 codecov-commenter

When I read the docs of this new bot I don't understand the difference to the existing filter expert using not_before = 2 days (the value of the timespan parameter here). Can you please elaborate on docs more, so that the advantage / difference to the filter expert is recognizable?

ghost avatar Aug 27 '21 15:08 ghost

@sebix Please check reviewer is active or no.

mariuskarotkis avatar Jan 12 '23 08:01 mariuskarotkis

Can you answer this question in the docs?

When I read the docs of this new bot I don't understand the difference to the existing filter expert using not_before = 2 days (the value of the timespan parameter here). Can you please elaborate on docs more, so that the advantage / difference to the filter expert is recognizable?

sebix avatar Jan 20 '23 21:01 sebix