WIP: Kafka/AVRO output bot.
This is my first output bot contribution, and should be considered a work in progress.
This output bot goes a bit farther than simply outputting all threat intel to a kafka topic. The idea is that the intelligence can get routed to a topic that's dedicated to its source intelligence type. I'm opening this PR to solicit feedback on ironing out this bot.
First of my questions: in order to make avro work, i've got to define an avro schema. For IntelMQ, this is a large data type mapping that should be stored as a configuration template. An example can be seen Here.
Another thing to note is that for output, especially to avro, fields need to be flattened, and all . chars need to be removed from field names, so these field names are statically defined in avro as _'s. I'd be curious about any thoughts about the liberties taken with regards to field renaming + flattening for this use case.
Codecov Report
Merging #1251 into develop will decrease coverage by
0.32%. The diff coverage is25%.
@@ Coverage Diff @@
## develop #1251 +/- ##
===========================================
- Coverage 75.14% 74.81% -0.33%
===========================================
Files 260 261 +1
Lines 12131 12211 +80
Branches 1623 1637 +14
===========================================
+ Hits 9116 9136 +20
- Misses 2662 2721 +59
- Partials 353 354 +1
| Impacted Files | Coverage Δ | |
|---|---|---|
| intelmq/bots/outputs/kafka/output.py | 25% <25%> (ø) |
First of my questions: in order to make avro work, i've got to define an avro schema. For IntelMQ, this is a large data type mapping that should be stored as a configuration template. An example can be seen Here.
Where's the question? :D
Another thing to note is that for output, especially to avro, fields need to be flattened, and all . chars need to be removed from field names, so these field names are statically defined in avro as _'s. I'd be curious about any thoughts about the liberties taken with regards to field renaming + flattening for this use case.
This is also the case for the elastic search bot, where a parameter has been used to specify the replacement character and _ was the default. (Not necessary any more in ES, see #1188)
So the question is simply, where should i put the files in the source so they can be referenced? Should i add them in contrib? For my instance i put config files in /opt/intelmq/var/lib/bots/kafka-output/[key_file.avsc, topics.conf, value_file.avsc]
AFAIU this is still WIP, right?
Changing the Milestone then to the next version, 1.1.0 is now coming soon.
Yep, still a WIP. Sorry as always for the delay.
On Thu, Jun 21, 2018, 8:28 AM Wagner [email protected] wrote:
AFAIU this is still WIP, right?
Changing the Milestone then to the next version, 1.1.0 is now coming soon.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/certtools/intelmq/pull/1251#issuecomment-399086471, or mute the thread https://github.com/notifications/unsubscribe-auth/ABWTcJQ9LLUqBtWT6h_k2BeIlb21xCWwks5t-5GLgaJpZM4UZlQp .
@z0r0 are you still working on this? The PR is in a pretty good shape and it would be a pity if we'd not continue the work.
Hello, we ended up not implementing it and went with something else at work. That being said, I'm going to finish this one up on my own time in the coming weeks, so stay tuned.
Hello, we ended up not implementing it and went with something else at work. That being said, I'm going to finish this one up on my own time in the coming weeks, so stay tuned.
Sorry to hear that and thanks for your offer to finish it! Please let us know how we can support you.