druid-docker
druid-docker copied to clipboard
Update druid to the latest version available
Druid is now at version 0.8.2 while the base Dockerfile is pointed to 0.7.1.1 by default. A new version should be released to the registry with Druid 0.8.2.
Hi @thedrow - we have abandoned plans to use Druid in our systems and I don't plan to maintain this repo. Do you want to take it over?
Right now I'm evaluating it. I'll ping you if I need commit access to the repository.
@zcox I wonder why you refused to use Druid, in two words? Whether there are any drawbacks, which need to know? We are considering to use Druid for in-company analytics service like Google Analytics, but realtime, with 100-200mil/day events and 10-50 concurrent ad-hoc queries on realtime and historical nodes with expectation of sub-second answers no matter where request goes — on realtime node or on data years ago. Druid at first glance looks perfect to fit such kind of tasks instead of messing around with Spark or Flink streaming mep-reducers as intermediates.
@ravlio this tweet sums it up: https://twitter.com/zcox/status/666304216290324480
Druid is extremely difficult, complex and time-consuming to initially set up, and operate in production. So unless you really need to ingest 1M+ events/sec AND you have a dedicated team to take care of it, just use Elasticsearch. The aggregations API in Elasticsearch 2 is great, and fast. We also did not like the fact that Druid requires you to run more realtime nodes whenever you want to create a new data source.
200M events/day is roughly 2300 events/sec. I imagine an Elasticsearch cluster could ingest that just fine, especially if you're pre-aggregating in a Flink streaming job. We are doing exactly that: apps => kafka => flink => elasticsearch. That is much simpler to operate than Druid.