vector icon indicating copy to clipboard operation
vector copied to clipboard

New `timescale` sink

Open binarylogic opened this issue 6 years ago • 9 comments

I'd like to support Timescale as a sink, and after talking to them it appears we just need to support Postgres since Timescale is built on Postgres. I'm hesitant to call it a Postgres sink given that we'll probably want to make certain assumptions about how we're writing data. Before we begin work we should determine if this is the appropriate course of action.

Ref https://github.com/timberio/vector/issues/935

binarylogic avatar Sep 26 '19 16:09 binarylogic

I’m not familiar with Timescale, but wonder would it make sense support unified SQL sinks, which would include Postgres, ClickHouse, and at least for the sake of easy tests, SQLite. The idea is to have same configuration structure for them and, ideally, support some kind of automatic table creation/migrations.

ghost avatar Sep 28 '19 21:09 ghost

As far as writing data there is no difference between Timescale and Postgres.

It would be up to the user to configure the destination tables (and hypertables in the Timscaledb case) before the sink was used.

jamessewell avatar Oct 03 '19 09:10 jamessewell

A general unified SQL sink might be challenging given the language variants between the different databases.

But @jamessewell is correct, there should be no difference in writing data between Timescale and Postgres (but there may be other reasons for creating a Timescale specific sink).

akulkarni avatar Oct 08 '19 02:10 akulkarni

I forgot to note our progress here: this can be achieved using PostgREST (a REST gateway for PostgreSQL) . Performance is very good.

jamessewell avatar Feb 21 '20 00:02 jamessewell

@jamessewell Could you please elaborate on details of your implementation of this if possible? Are you using http sink to drop events to PostgREST? And how do you create some tables in timescaledb before starting to push events?

123BLiN avatar May 20 '21 08:05 123BLiN

Note that there is a more generic issue #6556 which should probably be implemented first. As discussed above, it might even already solve this issue.

ypid-geberit avatar Aug 17 '21 12:08 ypid-geberit

Repurposing this to be for timescale specifically. https://github.com/vectordotdev/vector/issues/15765 is tracking a generic postgres sink.

jszwedko avatar Dec 28 '22 16:12 jszwedko

I wonder if what is done at https://github.com/vectordotdev/vector/pull/21248 would work out-of-the-box for timescale. Is anyone willing to try? I could try that if I found enough time 😃

jorgehermo9 avatar Mar 06 '25 07:03 jorgehermo9

I think we can close this issue as the postgres source should work with timescale. @isbm Afaik, you are using timescale, right? (as you said here)

@jszwedko

jorgehermo9 avatar Jun 15 '25 21:06 jorgehermo9

I think at some point folks were talking about the timescaledb sink setting up the hypertable and perhaps other options, but I think folks are probably okay with that not being included in vector and setting up the schema themselves via some other means.

CameronNemo avatar Jun 16 '25 16:06 CameronNemo

Thanks @CameronNemo . That is something I was wondering about. I'm not familiar with Timescale so I'm not sure if there are additional things that would be beneficial to include in a timescale sink beyond simply sending the data (which the postgres sink is capable of, thanks @jorgehermo9 !). I think someone may need to dig into this more to validate there wouldn't be additional benefit to a timescale sink that wraps the postgres sink before closing it.

jszwedko avatar Jun 16 '25 17:06 jszwedko

I think we could take this approach if custom behaviour for timescale is needed https://github.com/vectordotdev/vector/issues/21308#issuecomment-2380873820. But it would be very useful to have a concrete feature request (for example, enable that hypertable parameter) instead of a generic one.

The current postgres sink require the user to manage the DDL, as it does not create tables. I think it would be the same with hypertables (docs) @CameronNemo

jorgehermo9 avatar Jun 16 '25 17:06 jorgehermo9

TimescaleDB has a few things which apply here:

*create hypertable (function and CREATE TABLE API available)*Creates a hypertable (automagically partitioned just in time) from a table

enable columnstore (CREATE TABLE / ALTER TABLE API) Activate a columnar store for a hypertable

Add columnstore policyFor a hypertable with a columnstore define the window after which rows are moved across

Add retention policy Define a window after which rows are removed from the hypertable

If you're passing the burden of DDL to the user, and you're not querying the hypertables (ingest only) then you don't need to implement any of these.

On Tue, Jun 17, 2025 at 5:45 AM Jorge Hermo @.***> wrote:

jorgehermo9 left a comment (vectordotdev/vector#939) https://github.com/vectordotdev/vector/issues/939#issuecomment-2977491944

I think we could take this approach if custom behaviour for timescale is needed #21308 (comment) https://github.com/vectordotdev/vector/issues/21308#issuecomment-2380873820. But it would be very useful to have a concrete feature request (for example, enable that hypertable parameter) instead of a generic one.

The current postgres sink require the user to manage the DDL, as it does not create tables. I think it would be the same with hypertables (docs https://docs.timescale.com/api/latest/hypertable/#the-hypertable-workflow) @CameronNemo https://github.com/CameronNemo

— Reply to this email directly, view it on GitHub https://github.com/vectordotdev/vector/issues/939#issuecomment-2977491944, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJJDIZKZA5FREGUXB5ZNQL3D37D3AVCNFSM6AAAAABYN5Y4KGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSNZXGQ4TCOJUGQ . You are receiving this because you were mentioned.Message ID: @.***>

jamessewell avatar Jun 16 '25 21:06 jamessewell

@isbm Afaik, you are using timescale, right?

Yes. But we are losing data on Vector shutdown tho. 😉

isbm avatar Jun 17 '25 08:06 isbm