intelmq
intelmq copied to clipboard
Add basic Syslog UDP collector bot
Extremely basic, probably too slow, but simple and working Syslog collector over UDP.
We will probably not be running this in production, but I had already written it as a proof of concept, and thought it marginally more useful to share the code than quietly disposing of it. Especially since the existing pull request for a Syslog collector in #848 no longer works because of changes in both IntelMQ and Python.
Thanks for your contribution!
This PR is marked as draft, is this intentional?
Even if it is not perfect, I'm fine to merge it as long as it is functional. I'd add some explanation in Bots.md, also linking rsyslog's documentation as hint for a set-up (e.g. https://www.rsyslog.com/doc/master/configuration/examples.html which contains examples).
This PR is marked as draft, is this intentional?
Yes, as I do not consider this functionality even remotely ready for production. I made this as a proof of concept, but for production we'll be sending syslog traffic using AMQP through a RabbitMQ server. Obvious deficiencies in this bot include:
- Only does UDP, not TCP.
- Synchronous single-threaded design means abysmal performance and probably dropped messages under even light load.
- Doesn't validate the syslog data format at all. Luckily, syslog is simple enough that treating it as a simple text string sort of works, but loses information such as the reporting hostname and the timestamp.
Even if it is not perfect, I'm fine to merge it as long as it is functional. I'd add some explanation in Bots.md, also linking rsyslog's documentation as hint for a set-up (e.g. https://www.rsyslog.com/doc/master/configuration/examples.html which contains examples).
I'm wary of people not considering any documented caveats and attempting to use this code for things it wasn't designed for, losing data in the process.
Thanks for your response. I think the collector should be called "UDP", not "Syslog", as syslog is just the data format (relevant for parsing), not the transport protocol.
Something like this (which is totally untested)?
This does present the problem that there is already a collector named "tcp", which accepts IntelMQ messages, not raw bytes. Maybe this should be called "udp_text" or "udp_raw" to distinguish them, and make clear that there are two possible bots (IntelMQ messages over UDP and raw text over TCP) not implemented?
Codecov Report
Merging #1611 into develop will decrease coverage by
0.05%. The diff coverage is35.71%.
@@ Coverage Diff @@
## develop #1611 +/- ##
===========================================
- Coverage 75.55% 75.50% -0.06%
===========================================
Files 391 392 +1
Lines 19700 19728 +28
Branches 2708 2709 +1
===========================================
+ Hits 14885 14895 +10
- Misses 4230 4248 +18
Partials 585 585
| Impacted Files | Coverage Δ | |
|---|---|---|
| intelmq/bots/collectors/udp/collector.py | 35.71% <35.71%> (ø) |
Concerning the TCP collector issue: Previously we had no other use-case for the TCP collector than the IntelMQ to IntelMQ connection. If we have more, I'd be for offering both functionalities: The collector could then be able to receive arbitrary input (like syslog) but can also be capable of receiving the IntelMQ "flavor" (with the "Ok" message).
cc @e3rd (tcp collector/output author & user)
If I remember, TCP output has the parameter counterpart_is_intelmq. Depending on that it awaits an "Ok" message be received after each message is output.
TCP collector just sends "Ok" after every message it gets but I supposed this would not pose a problem for any arbitrary input. If it poses a problem, a parameter counterpart_is_intelmq might be easily added so that the collector stops sending "Ok".
That was the question, right?