arroyo icon indicating copy to clipboard operation
arroyo copied to clipboard

Add support Clickhouse database as source and sink

Open Delphin1 opened this issue 1 year ago • 4 comments

It will be great if Arroyo also will be able to work with Clickhouse.

Delphin1 avatar Oct 02 '23 15:10 Delphin1

FWIW, if your data is already on Kafka, it's trivial to sync

  • Kafka to Clickhouse: https://clickhouse.com/docs/en/integrations/kafka#clickhouse-to-kafka
  • Kafka from Clickhouse: https://clickhouse.com/docs/en/integrations/kafka#clickhouse-to-kafka

that said, syncing Arroyo stream to Clickhouse without Kafka would be cool indeed for long-term storage.

kzk2000 avatar Oct 07 '23 21:10 kzk2000

What would the high-level design be for implementing this feature and testing procedure? Looks like a cool one.

MuhtasimTanmoy avatar Nov 02 '23 21:11 MuhtasimTanmoy

Clickhouse has various integrations for data ingestions, Kafka as mentioned above is just one of them.

I'm no expert but maybe any of these https://clickhouse.com/docs/en/integrations -> search for "Data ingestion" work well together with what Arroyo.dev already has.

kzk2000 avatar Nov 02 '23 22:11 kzk2000

Maybe try remote select?

Something like:

SELECT * FROM remote('127.0.0.1', db.remote_engine_table) LIMIT 3;

CH Docs: https://clickhouse.com/docs/en/sql-reference/table-functions/remote

With that in place, you can simply run a remote insert into a ClickHouse table via the tcp protocol.

This might be easier and faster to implement than a full-blown integration.

marvin-hansen avatar Mar 04 '24 10:03 marvin-hansen