feat(sink): Add postgres sink
Closes #15765
This PR is not 100% ready by my side and there will likely be a few things wrong, but had a few questions and wanted to know if the direction seems right... So I would like an initial round of review if possible.
I tested the sink and it seems to be working, but I lack a lot of knowledge about Vector's internals and I'm not sure if the implementation is okay.
I inspired a lot from the databend and clickhouse sinks, but left a few questions as TODOs in the source. I found this sink a bit different from the others, as the others had the request_builder thing and encoding the payload in bytes (as most of the sinks are http based).. But I didn't think that fitted well in this case, as in the sqlx API I should wrap the events with the sqlx::types::Json type and that will do all the encoding with serde internally.
If someone want to manually test it, I used this Vector config:
[sources.demo_logs]
type = "demo_logs"
format = "apache_common"
[transforms.payload]
type = "remap"
inputs = ["demo_logs"]
source = """
.payload = .
"""
[sinks.postgres]
type = "postgres"
inputs = ["payload"]
endpoint = "postgres://postgres:postgres@localhost/test"
table = "test"
Run postgres server with podman run -e POSTGRES_PASSWORD=postgres -p 5432:5432 docker.io/postgres
and execute the following with psql -h localhost -U postgres:
CREATE DATABASE test;
then execute \c test
and last:
CREATE TABLE test (message TEXT, timestamp TIMESTAMP WITH TIME ZONE, payload JSONB);
And then, you will see logs in that table:
Thank you very much for the review @pront! I'm kinda busy these days but I will revisit this as soon as I can :)
There are a few failing checks. Also, let's add a new postgres semantic scope in https://github.com/vectordotdev/vector/blob/master/.github/workflows/semantic.yml. I will review once these are addressed. Thank you!
I will work on this PR these days, I'll ping you whenever it is ready for another round. Thank you so much @pront!
Hi @pront. I'm sorry for the delay. I added a bunch of new integration tests and I'm ready for another review round.
I still have a few doubts left maked as // TODO in code.
I also wonder if I should enable metrics & traces inputs in this shink (see this thread https://github.com/vectordotdev/vector/pull/21248#discussion_r1855446211). If so, I will add a few more tests similar to insert_multiple_events test I added in this PR. Metrics can be useful for sinks like timescale https://github.com/vectordotdev/vector/issues/939. But I don't know if the current postgres sink is compatible with timescaledb. I think with timescaledb we should use this same sink but with other flag (see this comment, so I think we should really think ahead if this sink will be used for more than just logs.
I think there are a few opened comment threads, feel free to resolve them if you think my answer is enough. Didn't resolve them myself just in case my answer is not clear.
Hi @pront. I'm sorry for the delay. I added a bunch of new integration tests and I'm ready for another review round.
I still have a few doubts left maked as
// TODOin code.
Hi @jorgehermo9, I will take another look now. Thanks!
I also wonder if I should enable metrics & traces inputs in this shink (see this thread #21248 (comment)). If so, I will add a few more tests similar to
insert_multiple_eventstest I added in this PR.
Yes, I was going to suggest exactly this. Add more inputs and observe how it works.
Metrics can be useful for sinks like timescale #939. But I don't know if the current postgres sink is compatible with timescaledb. I think with timescaledb we should use this same sink but with other flag (see this comment, so I think we should really think ahead if this sink will be used for more than just logs.
Are we talking about timescaledb and risingwave? Are there more? I am not familiar with all the nuances and I don't think we can plan for all extensions in advance. The proposed flavour (or flavor) property can be added incrementally without breaking the behavior of the sink.
Just dropping by to say that I am looking forward to introducing this new integration in Vector. And don't hesitate to give me a shout if you need anything.
Hi @pront! I'm sorry that I couldn't work on this these past weeks. I still have it on my radar and planning to answer your review ASAP.
Thank you for your reminder and I can (and will) spend more time on this in the following weeks. Sorry for the delay!
Of course! No need to apologize at all π
I just wanted to express my excitement about this work and to let you know Iβm here to support whenever youβre ready.
Hi @pront, I have more time available these days and will work in this PR. I have left to test how this sink behaves with metrics and traces and enable them if they work ok. Will update about that soon. I think I answered all of your comments, but correct me if I'm wrong
I will also think about adding more integration tests (and if traces & metrics works, I'll add integration tests for that too)
Are we talking about timescaledb and risingwave? Are there more? I am not familiar with all the nuances and I don't think we can plan for all extensions in advance. The proposed flavour (or flavor) property can be added incrementally without breaking the behavior of the sink.
Yes, I was talking about those two specifically. I agree to not plan in advance, we can look into that in future PRs
Hi @pront, I have more time available these days and will work in this PR. I have left to test how this sink behaves with metrics and traces and enable if its works ok. Will update about that soon. I think I answered all of your comments, but correct me if I'm wrong
π Awesome, I will prioritize reviewing this PR.
I tried with metrics (I have left to test with traces)
with this config
[sources.internal_metrics]
type = "internal_metrics"
[sinks.postgres]
type = "postgres"
inputs = ["internal_metrics"]
endpoint = "postgres://postgres:password123@localhost/test"
table = "metrics"
and this commands to spin-up a local postgres and connect to it:
docker run -e POSTGRES_PASSWORD=password123 -p 5432:5432 docker.io/postgres
psql -h localhost -U postgres
and this sql statements inside psql:
CREATE DATABASE test;
\c test
CREATE TABLE metrics(name TEXT, namespace TEXT, tags JSONB, timestamp TIMESTAMP WITH TIME ZONE, kind TEXT, counter JSONB, gauge JSONB, aggregated_histogram JSONB);
The table is populated with the given metrics and we can query it:
I had to create a database with the union of the fields of all metrics (counter JSONB, gauge JSONB, aggregated_histogram JSONB) but once the table parameter is templatable, we could have different postgres tables per metric type (one for counters, other for gauges, etc) and that would be very instersting!
this looks very promising ;)
I will update once I do more tests!
@pront I don't know how to propertly generate traces in Vector.. Is there any minimal configuration example to get traces similar to what I did with metrics?
I can try to ingest traces within integration tests, but I'd like to do a manual testing as I did with metrics & logs
EDIT: added traces tests in https://github.com/vectordotdev/vector/pull/21248/commits/c525a45b5c6eb470355124610cc36dfe2533e1f7 and https://github.com/vectordotdev/vector/pull/21248/commits/e9eb6889b228ad337810534fa2511c79258785cc
Added support for metrics and traces in 19796274e230a28daf2c18c29b69e1bae15baf6d. Left a few TODOs for the review and Im ready for another review round!
Derived from https://github.com/vectordotdev/vector/pull/21248/commits/e9eb6889b228ad337810534fa2511c79258785cc I manually tested composite type insertion
with this config
[sources.demo_logs]
type = "demo_logs"
format = "apache_common"
[transforms.structured]
type = "remap"
inputs = ["demo_logs"]
source = """
. = {
"message": .message,
"structured_field": {
"string": "value",
"integer": 1,
"float": 1.1,
"boolean": true,
}
}
"""
[sinks.postgres]
type = "postgres"
inputs = ["structured"]
endpoint = "postgres://postgres:password123@localhost/test"
table = "structured_table"
and those sql statements:
CREATE TYPE structured AS (string TEXT, integer BIGINT, float real, boolean BOOLEAN);
CREATE TABLE structured_table(message text, structured_field structured);
we can insert pg's composite types (records) https://www.postgresql.org/docs/current/rowtypes.html from nested fields, instead of having to use JSONB for those fields
I would like to document also an example of this in the website docs! This looks great :smile:
Ready for another round, thanks for the review! Left unresolved a few threads as I was unsure, feel free to resolve them if the changes seem ok!
Hi @jorgehermo9, I am looking at this now. Two quick things that popped out:
- The spell checker is failing
- You can add a new scope in
.github/workflows/semantic.yml
Hi @pront, fixed them! cargo vdev check events was failing too https://github.com/vectordotdev/vector/actions/runs/13358058774/job/37417026507
This 99% ready to go. We will need some more docs. See this comment from another PR.
Added an initial documentation @pront, I still have pending to add this note https://github.com/jorgehermo9/vector/blob/feature/postgres-sink/src/sinks/postgres/integration_tests.rs#L339
expect to finish all by end of day, but what is done right now can be reviewed though.
sorry for the delay!!
Can't update license files because of
> cargo vdev build licenses
Error: Could not read "/home/jorge/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/allocator-api2-0.2.16/license"
Caused by:
Is a directory (os error 21)
Error: command: cd "/home/jorge/github/vector-fork" && "dd-rust-license-tool" "write"
failed with exit code: 1
When listing that directory,
> tree license
license
βββ APACHE
βββ MIT
1 directory, 2 files
It seems that those two licenses are contained inside.
Checking Cargo.lock, this is displayed:
[[package]]
name = "allocator-api2"
version = "0.2.16"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0942ffc6dcaadf03badf6e6a2d0228460359d5e34b57ccdc720b7382dfbd5ec5"
It seems that [email protected] is being used. Browsing its repository at that point, https://github.com/zakarumych/allocator-api2/tree/735f821db310410dc975ba316388eaf6d9751b13/license It indeed is a directory. But, in never versions such as [email protected] https://github.com/zakarumych/allocator-api2/tree/63cd7fcc2f8854b5821c7054d026e8a4647acde1 they got rid of that directory and split the license files into two LICENSE-APACHE and LICENSE-MIT license files at the root repo.
Given this, I think the only two solutions would be to either update dd-rust-license-tool to work with directories, or to bump allocator-api to v0.2.21 using a cargo patch, as it is a transitive dependency.
Can't update license files because of
> cargo vdev build licenses Error: Could not read "/home/jorge/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/allocator-api2-0.2.16/license" Caused by: Is a directory (os error 21) Error: command: cd "/home/jorge/github/vector-fork" && "dd-rust-license-tool" "write" failed with exit code: 1When listing that directory,
> tree license license βββ APACHE βββ MIT 1 directory, 2 filesIt seems that those two licenses are contained inside.
Checking
Cargo.lock, this is displayed:[[package]] name = "allocator-api2" version = "0.2.16" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "0942ffc6dcaadf03badf6e6a2d0228460359d5e34b57ccdc720b7382dfbd5ec5"It seems that
[email protected]is being used. Browsing its repository at that point, https://github.com/zakarumych/allocator-api2/tree/735f821db310410dc975ba316388eaf6d9751b13/license It indeed is a directory. But, in never versions such as[email protected]https://github.com/zakarumych/allocator-api2/tree/63cd7fcc2f8854b5821c7054d026e8a4647acde1 they got rid of that directory and split the license files into twoLICENSE-APACHEandLICENSE-MITlicense files at the root repo.Given this, I think the only two solutions would be to either update
dd-rust-license-toolto work with directories, or to bumpallocator-apitov0.2.21using a cargopatch, as it is a transitive dependency.
I think I fixed it with https://github.com/vectordotdev/vector/pull/21248/commits/e358befd2bcae148a193796709aeaf3a1531ca14
I think I fixed it with https://github.com/vectordotdev/vector/commit/e358befd2bcae148a193796709aeaf3a1531ca14
Just for curiosity sake, how did you do it? I ran cargo vdev build licenses and didn't work :open_mouth:
Oh, I see, raw dd-rust-license-tool write works... I wonder if we broke cargo vdev build licenses for everyone with this... :disappointed:
spell checking fixes are required.
I think spellcheck was fixed in https://github.com/vectordotdev/vector/pull/21248/commits/627dbba40545a73046f3080372d057f79dcc61c4
cargo vdev build licenses
You can try cargo vdev check licenses and also we have the CI
Addressed all comments in 8a204693aa029ba4ac15fc18792a4a3c82642984. I had to update the website layout to handle exactly_once delivery, as this was the first sink of that kind!
Exciting, thank you @jorgehermo9 π
Thank you very much for your great help here @pront! I know this have been a long one and I'm very grateful for your patience! ππ
Thanks for this amazing contribution @jorgehermo9 :heart:
@pront In which release do you plan to ship it?
@jmaupetit it will land on the next release which is scheduled on monday https://calendar.google.com/calendar/u/0/[email protected]&ctz=America/New_York&pli=1