arroyo
arroyo copied to clipboard
Add support Clickhouse database as source and sink
It will be great if Arroyo also will be able to work with Clickhouse.
FWIW, if your data is already on Kafka, it's trivial to sync
- Kafka to Clickhouse: https://clickhouse.com/docs/en/integrations/kafka#clickhouse-to-kafka
- Kafka from Clickhouse: https://clickhouse.com/docs/en/integrations/kafka#clickhouse-to-kafka
that said, syncing Arroyo stream to Clickhouse without Kafka would be cool indeed for long-term storage.
What would the high-level design be for implementing this feature and testing procedure? Looks like a cool one.
Clickhouse has various integrations for data ingestions, Kafka as mentioned above is just one of them.
I'm no expert but maybe any of these https://clickhouse.com/docs/en/integrations -> search for "Data ingestion" work well together with what Arroyo.dev already has.
Maybe try remote select?
Something like:
SELECT * FROM remote('127.0.0.1', db.remote_engine_table) LIMIT 3;
CH Docs: https://clickhouse.com/docs/en/sql-reference/table-functions/remote
With that in place, you can simply run a remote insert into a ClickHouse table via the tcp protocol.
This might be easier and faster to implement than a full-blown integration.