vector icon indicating copy to clipboard operation
vector copied to clipboard

feat(sink): Add postgres sink

Open jorgehermo9 opened this issue 1 year ago β€’ 1 comments

Closes #15765

This PR is not 100% ready by my side and there will likely be a few things wrong, but had a few questions and wanted to know if the direction seems right... So I would like an initial round of review if possible.

I tested the sink and it seems to be working, but I lack a lot of knowledge about Vector's internals and I'm not sure if the implementation is okay.

I inspired a lot from the databend and clickhouse sinks, but left a few questions as TODOs in the source. I found this sink a bit different from the others, as the others had the request_builder thing and encoding the payload in bytes (as most of the sinks are http based).. But I didn't think that fitted well in this case, as in the sqlx API I should wrap the events with the sqlx::types::Json type and that will do all the encoding with serde internally.

If someone want to manually test it, I used this Vector config:

[sources.demo_logs]
type = "demo_logs"
format = "apache_common"

[transforms.payload]
type = "remap"
inputs = ["demo_logs"]
source = """
.payload = .
"""

[sinks.postgres]
type = "postgres"
inputs = ["payload"]
endpoint = "postgres://postgres:postgres@localhost/test"
table = "test"

Run postgres server with podman run -e POSTGRES_PASSWORD=postgres -p 5432:5432 docker.io/postgres

and execute the following with psql -h localhost -U postgres:

CREATE DATABASE test;

then execute \c test and last:

CREATE TABLE test (message TEXT, timestamp TIMESTAMP WITH TIME ZONE, payload JSONB);

And then, you will see logs in that table:

image

jorgehermo9 avatar Sep 09 '24 22:09 jorgehermo9

Thank you very much for the review @pront! I'm kinda busy these days but I will revisit this as soon as I can :)

jorgehermo9 avatar Oct 15 '24 20:10 jorgehermo9

There are a few failing checks. Also, let's add a new postgres semantic scope in https://github.com/vectordotdev/vector/blob/master/.github/workflows/semantic.yml. I will review once these are addressed. Thank you!

pront avatar Nov 25 '24 17:11 pront

I will work on this PR these days, I'll ping you whenever it is ready for another round. Thank you so much @pront!

jorgehermo9 avatar Nov 25 '24 18:11 jorgehermo9

Hi @pront. I'm sorry for the delay. I added a bunch of new integration tests and I'm ready for another review round.

I still have a few doubts left maked as // TODO in code.

I also wonder if I should enable metrics & traces inputs in this shink (see this thread https://github.com/vectordotdev/vector/pull/21248#discussion_r1855446211). If so, I will add a few more tests similar to insert_multiple_events test I added in this PR. Metrics can be useful for sinks like timescale https://github.com/vectordotdev/vector/issues/939. But I don't know if the current postgres sink is compatible with timescaledb. I think with timescaledb we should use this same sink but with other flag (see this comment, so I think we should really think ahead if this sink will be used for more than just logs.

I think there are a few opened comment threads, feel free to resolve them if you think my answer is enough. Didn't resolve them myself just in case my answer is not clear.

jorgehermo9 avatar Dec 22 '24 10:12 jorgehermo9

Hi @pront. I'm sorry for the delay. I added a bunch of new integration tests and I'm ready for another review round.

I still have a few doubts left maked as // TODO in code.

Hi @jorgehermo9, I will take another look now. Thanks!

I also wonder if I should enable metrics & traces inputs in this shink (see this thread #21248 (comment)). If so, I will add a few more tests similar to insert_multiple_events test I added in this PR.

Yes, I was going to suggest exactly this. Add more inputs and observe how it works.

Metrics can be useful for sinks like timescale #939. But I don't know if the current postgres sink is compatible with timescaledb. I think with timescaledb we should use this same sink but with other flag (see this comment, so I think we should really think ahead if this sink will be used for more than just logs.

Are we talking about timescaledb and risingwave? Are there more? I am not familiar with all the nuances and I don't think we can plan for all extensions in advance. The proposed flavour (or flavor) property can be added incrementally without breaking the behavior of the sink.

pront avatar Jan 03 '25 15:01 pront

Just dropping by to say that I am looking forward to introducing this new integration in Vector. And don't hesitate to give me a shout if you need anything.

pront avatar Jan 24 '25 15:01 pront

Hi @pront! I'm sorry that I couldn't work on this these past weeks. I still have it on my radar and planning to answer your review ASAP.

Thank you for your reminder and I can (and will) spend more time on this in the following weeks. Sorry for the delay!

jorgehermo9 avatar Jan 24 '25 15:01 jorgehermo9

Of course! No need to apologize at all πŸ˜…

I just wanted to express my excitement about this work and to let you know I’m here to support whenever you’re ready.

pront avatar Jan 24 '25 16:01 pront

Hi @pront, I have more time available these days and will work in this PR. I have left to test how this sink behaves with metrics and traces and enable them if they work ok. Will update about that soon. I think I answered all of your comments, but correct me if I'm wrong

I will also think about adding more integration tests (and if traces & metrics works, I'll add integration tests for that too)

Are we talking about timescaledb and risingwave? Are there more? I am not familiar with all the nuances and I don't think we can plan for all extensions in advance. The proposed flavour (or flavor) property can be added incrementally without breaking the behavior of the sink.

Yes, I was talking about those two specifically. I agree to not plan in advance, we can look into that in future PRs

jorgehermo9 avatar Feb 07 '25 17:02 jorgehermo9

Hi @pront, I have more time available these days and will work in this PR. I have left to test how this sink behaves with metrics and traces and enable if its works ok. Will update about that soon. I think I answered all of your comments, but correct me if I'm wrong

πŸŽ‰ Awesome, I will prioritize reviewing this PR.

pront avatar Feb 07 '25 17:02 pront

I tried with metrics (I have left to test with traces)

with this config

[sources.internal_metrics]
type = "internal_metrics"

[sinks.postgres]
type = "postgres"
inputs = ["internal_metrics"]
endpoint = "postgres://postgres:password123@localhost/test"
table = "metrics"

and this commands to spin-up a local postgres and connect to it:

docker run -e POSTGRES_PASSWORD=password123 -p 5432:5432 docker.io/postgres
psql -h localhost -U postgres

and this sql statements inside psql:

CREATE DATABASE test;
\c test
CREATE TABLE metrics(name TEXT, namespace TEXT, tags JSONB, timestamp TIMESTAMP WITH TIME ZONE, kind TEXT, counter JSONB, gauge JSONB, aggregated_histogram JSONB);

The table is populated with the given metrics and we can query it: image

I had to create a database with the union of the fields of all metrics (counter JSONB, gauge JSONB, aggregated_histogram JSONB) but once the table parameter is templatable, we could have different postgres tables per metric type (one for counters, other for gauges, etc) and that would be very instersting!

this looks very promising ;)

I will update once I do more tests!

jorgehermo9 avatar Feb 08 '25 13:02 jorgehermo9

@pront I don't know how to propertly generate traces in Vector.. Is there any minimal configuration example to get traces similar to what I did with metrics?

I can try to ingest traces within integration tests, but I'd like to do a manual testing as I did with metrics & logs

EDIT: added traces tests in https://github.com/vectordotdev/vector/pull/21248/commits/c525a45b5c6eb470355124610cc36dfe2533e1f7 and https://github.com/vectordotdev/vector/pull/21248/commits/e9eb6889b228ad337810534fa2511c79258785cc

jorgehermo9 avatar Feb 08 '25 15:02 jorgehermo9

Added support for metrics and traces in 19796274e230a28daf2c18c29b69e1bae15baf6d. Left a few TODOs for the review and Im ready for another review round!

jorgehermo9 avatar Feb 09 '25 11:02 jorgehermo9

Derived from https://github.com/vectordotdev/vector/pull/21248/commits/e9eb6889b228ad337810534fa2511c79258785cc I manually tested composite type insertion

with this config

[sources.demo_logs]
type = "demo_logs"
format = "apache_common"


[transforms.structured]
type = "remap"
inputs = ["demo_logs"]
source = """
. = {
    "message": .message,
    "structured_field": {
        "string": "value",
        "integer": 1,
        "float": 1.1,
        "boolean": true,
    }
}
"""

[sinks.postgres]
type = "postgres"
inputs = ["structured"]
endpoint = "postgres://postgres:password123@localhost/test"
table = "structured_table"

and those sql statements:

CREATE TYPE structured AS (string TEXT, integer BIGINT, float real, boolean BOOLEAN);
CREATE TABLE structured_table(message text, structured_field structured);

we can insert pg's composite types (records) https://www.postgresql.org/docs/current/rowtypes.html from nested fields, instead of having to use JSONB for those fields

image

I would like to document also an example of this in the website docs! This looks great :smile:

jorgehermo9 avatar Feb 09 '25 13:02 jorgehermo9

Ready for another round, thanks for the review! Left unresolved a few threads as I was unsure, feel free to resolve them if the changes seem ok!

jorgehermo9 avatar Feb 16 '25 19:02 jorgehermo9

Hi @jorgehermo9, I am looking at this now. Two quick things that popped out:

  • The spell checker is failing
  • You can add a new scope in .github/workflows/semantic.yml

pront avatar Feb 18 '25 17:02 pront

Hi @pront, fixed them! cargo vdev check events was failing too https://github.com/vectordotdev/vector/actions/runs/13358058774/job/37417026507

jorgehermo9 avatar Feb 18 '25 18:02 jorgehermo9

This 99% ready to go. We will need some more docs. See this comment from another PR.

pront avatar Feb 18 '25 18:02 pront

Added an initial documentation @pront, I still have pending to add this note https://github.com/jorgehermo9/vector/blob/feature/postgres-sink/src/sinks/postgres/integration_tests.rs#L339

expect to finish all by end of day, but what is done right now can be reviewed though.

sorry for the delay!!

jorgehermo9 avatar Mar 04 '25 16:03 jorgehermo9

Can't update license files because of

> cargo vdev build licenses
Error: Could not read "/home/jorge/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/allocator-api2-0.2.16/license"

Caused by:
    Is a directory (os error 21)
Error: command: cd "/home/jorge/github/vector-fork" && "dd-rust-license-tool" "write"
  failed with exit code: 1

When listing that directory,

> tree license
license
β”œβ”€β”€ APACHE
└── MIT

1 directory, 2 files

It seems that those two licenses are contained inside.

Checking Cargo.lock, this is displayed:

[[package]]
name = "allocator-api2"
version = "0.2.16"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0942ffc6dcaadf03badf6e6a2d0228460359d5e34b57ccdc720b7382dfbd5ec5"

It seems that [email protected] is being used. Browsing its repository at that point, https://github.com/zakarumych/allocator-api2/tree/735f821db310410dc975ba316388eaf6d9751b13/license It indeed is a directory. But, in never versions such as [email protected] https://github.com/zakarumych/allocator-api2/tree/63cd7fcc2f8854b5821c7054d026e8a4647acde1 they got rid of that directory and split the license files into two LICENSE-APACHE and LICENSE-MIT license files at the root repo.

Given this, I think the only two solutions would be to either update dd-rust-license-tool to work with directories, or to bump allocator-api to v0.2.21 using a cargo patch, as it is a transitive dependency.

jorgehermo9 avatar Mar 04 '25 23:03 jorgehermo9

Can't update license files because of

> cargo vdev build licenses
Error: Could not read "/home/jorge/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/allocator-api2-0.2.16/license"

Caused by:
    Is a directory (os error 21)
Error: command: cd "/home/jorge/github/vector-fork" && "dd-rust-license-tool" "write"
  failed with exit code: 1

When listing that directory,

> tree license
license
β”œβ”€β”€ APACHE
└── MIT

1 directory, 2 files

It seems that those two licenses are contained inside.

Checking Cargo.lock, this is displayed:

[[package]]
name = "allocator-api2"
version = "0.2.16"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0942ffc6dcaadf03badf6e6a2d0228460359d5e34b57ccdc720b7382dfbd5ec5"

It seems that [email protected] is being used. Browsing its repository at that point, https://github.com/zakarumych/allocator-api2/tree/735f821db310410dc975ba316388eaf6d9751b13/license It indeed is a directory. But, in never versions such as [email protected] https://github.com/zakarumych/allocator-api2/tree/63cd7fcc2f8854b5821c7054d026e8a4647acde1 they got rid of that directory and split the license files into two LICENSE-APACHE and LICENSE-MIT license files at the root repo.

Given this, I think the only two solutions would be to either update dd-rust-license-tool to work with directories, or to bump allocator-api to v0.2.21 using a cargo patch, as it is a transitive dependency.

I think I fixed it with https://github.com/vectordotdev/vector/pull/21248/commits/e358befd2bcae148a193796709aeaf3a1531ca14

pront avatar Mar 05 '25 19:03 pront

I think I fixed it with https://github.com/vectordotdev/vector/commit/e358befd2bcae148a193796709aeaf3a1531ca14

Just for curiosity sake, how did you do it? I ran cargo vdev build licenses and didn't work :open_mouth:

jorgehermo9 avatar Mar 05 '25 20:03 jorgehermo9

Oh, I see, raw dd-rust-license-tool write works... I wonder if we broke cargo vdev build licenses for everyone with this... :disappointed:

jorgehermo9 avatar Mar 05 '25 20:03 jorgehermo9

spell checking fixes are required.

I think spellcheck was fixed in https://github.com/vectordotdev/vector/pull/21248/commits/627dbba40545a73046f3080372d057f79dcc61c4

jorgehermo9 avatar Mar 05 '25 20:03 jorgehermo9

cargo vdev build licenses

You can try cargo vdev check licenses and also we have the CI

pront avatar Mar 05 '25 20:03 pront

Addressed all comments in 8a204693aa029ba4ac15fc18792a4a3c82642984. I had to update the website layout to handle exactly_once delivery, as this was the first sink of that kind!

jorgehermo9 avatar Mar 05 '25 20:03 jorgehermo9

Exciting, thank you @jorgehermo9 πŸŽ‰

pront avatar Mar 06 '25 21:03 pront

Thank you very much for your great help here @pront! I know this have been a long one and I'm very grateful for your patience! πŸŽ‰πŸŽ‰

jorgehermo9 avatar Mar 06 '25 21:03 jorgehermo9

Thanks for this amazing contribution @jorgehermo9 :heart:

@pront In which release do you plan to ship it?

jmaupetit avatar Apr 02 '25 10:04 jmaupetit

@jmaupetit it will land on the next release which is scheduled on monday https://calendar.google.com/calendar/u/0/[email protected]&ctz=America/New_York&pli=1

jorgehermo9 avatar Apr 02 '25 10:04 jorgehermo9