intelmq icon indicating copy to clipboard operation
intelmq copied to clipboard

Add basic Syslog UDP collector bot

Open creideiki opened this issue 5 years ago • 7 comments

Extremely basic, probably too slow, but simple and working Syslog collector over UDP.

We will probably not be running this in production, but I had already written it as a proof of concept, and thought it marginally more useful to share the code than quietly disposing of it. Especially since the existing pull request for a Syslog collector in #848 no longer works because of changes in both IntelMQ and Python.

creideiki avatar Sep 18 '20 08:09 creideiki

Thanks for your contribution!

This PR is marked as draft, is this intentional?

Even if it is not perfect, I'm fine to merge it as long as it is functional. I'd add some explanation in Bots.md, also linking rsyslog's documentation as hint for a set-up (e.g. https://www.rsyslog.com/doc/master/configuration/examples.html which contains examples).

ghost avatar Oct 10 '20 20:10 ghost

This PR is marked as draft, is this intentional?

Yes, as I do not consider this functionality even remotely ready for production. I made this as a proof of concept, but for production we'll be sending syslog traffic using AMQP through a RabbitMQ server. Obvious deficiencies in this bot include:

  1. Only does UDP, not TCP.
  2. Synchronous single-threaded design means abysmal performance and probably dropped messages under even light load.
  3. Doesn't validate the syslog data format at all. Luckily, syslog is simple enough that treating it as a simple text string sort of works, but loses information such as the reporting hostname and the timestamp.

Even if it is not perfect, I'm fine to merge it as long as it is functional. I'd add some explanation in Bots.md, also linking rsyslog's documentation as hint for a set-up (e.g. https://www.rsyslog.com/doc/master/configuration/examples.html which contains examples).

I'm wary of people not considering any documented caveats and attempting to use this code for things it wasn't designed for, losing data in the process.

creideiki avatar Oct 12 '20 08:10 creideiki

Thanks for your response. I think the collector should be called "UDP", not "Syslog", as syslog is just the data format (relevant for parsing), not the transport protocol.

ghost avatar Oct 16 '20 11:10 ghost

Something like this (which is totally untested)?

This does present the problem that there is already a collector named "tcp", which accepts IntelMQ messages, not raw bytes. Maybe this should be called "udp_text" or "udp_raw" to distinguish them, and make clear that there are two possible bots (IntelMQ messages over UDP and raw text over TCP) not implemented?

creideiki avatar Oct 20 '20 14:10 creideiki

Codecov Report

Merging #1611 into develop will decrease coverage by 0.05%. The diff coverage is 35.71%.

@@             Coverage Diff             @@
##           develop    #1611      +/-   ##
===========================================
- Coverage    75.55%   75.50%   -0.06%     
===========================================
  Files          391      392       +1     
  Lines        19700    19728      +28     
  Branches      2708     2709       +1     
===========================================
+ Hits         14885    14895      +10     
- Misses        4230     4248      +18     
  Partials       585      585              
Impacted Files Coverage Δ
intelmq/bots/collectors/udp/collector.py 35.71% <35.71%> (ø)

codecov-io avatar Oct 20 '20 15:10 codecov-io

Concerning the TCP collector issue: Previously we had no other use-case for the TCP collector than the IntelMQ to IntelMQ connection. If we have more, I'd be for offering both functionalities: The collector could then be able to receive arbitrary input (like syslog) but can also be capable of receiving the IntelMQ "flavor" (with the "Ok" message).

cc @e3rd (tcp collector/output author & user)

ghost avatar Oct 21 '20 09:10 ghost

If I remember, TCP output has the parameter counterpart_is_intelmq. Depending on that it awaits an "Ok" message be received after each message is output. TCP collector just sends "Ok" after every message it gets but I supposed this would not pose a problem for any arbitrary input. If it poses a problem, a parameter counterpart_is_intelmq might be easily added so that the collector stops sending "Ok".
That was the question, right?

e3rd avatar Oct 21 '20 12:10 e3rd