ClickHouse icon indicating copy to clipboard operation
ClickHouse copied to clipboard

Postgres certificate error

Open mausch opened this issue 1 year ago • 23 comments

Hi, I upgraded Clickhouse from 24.6 to 24.9 (using the operator)

A query using the postgresql function ( https://clickhouse.com/docs/en/sql-reference/table-functions/postgresql ) started failing after this upgrade:

 HTTPDriver for http://clickhouse-clickhouse.my-clickhouse.svc.cluster.local:8123 returned response code 500)
   Code: 614. DB::Exception: Try 2. Connection to `postgres.default.svc.cluster.local:5432` failed with error: connection to server at "postgres.default.svc.cluster.local" (10.100.187.138), port 5432 failed: could not open certificate file "/root/.postgresql/postgresql.crt": Permission denied
  connection to server at "postgres.default.svc.cluster.local" (10.100.187.138), port 5432 failed: FATAL:  no pg_hba.conf entry for host "172.16.210.196", user "elevate", database "elevate_develop", no encryption

I fixed this by setting the PGSSLCERT env var on the podTemplate. ref: https://github.com/nginx/unit/issues/834#issuecomment-1410297997

Maybe some libpq upgrade in Clickhouse triggered this? Users should not need to set this env var I don't think.

mausch avatar Oct 04 '24 11:10 mausch

👍🏻

mickdelaney avatar Oct 04 '24 11:10 mickdelaney

As I am running in Kubernetes and wanted to avoid a restart I just created the directory /root/.postgresql and made it readable. Still just a workaround, was not needed before upgrading to 24.9.1.

zommarin avatar Oct 04 '24 12:10 zommarin

We have a few remote tables using the PostgreSQL engine and encountered the same error after upgrading ClickHouse from 24.8.3.59-alpine to 24.9.2.42-alpine

@mausch W/A did help, thanks!

EinavDanielDX avatar Oct 08 '24 13:10 EinavDanielDX

@rschu1ze it seems this might have been introduced in the recent libraries update (https://github.com/ClickHouse/ClickHouse/pull/69580 or https://github.com/ClickHouse/ClickHouse/pull/69741)

Algunenano avatar Oct 09 '24 06:10 Algunenano

I indeed recently bumped Postgres-related libraries needed by the Postgres table function as the previous code dump was not maintainable

libpq

  • https://github.com/ClickHouse/ClickHouse/pull/69564
  • https://github.com/ClickHouse/ClickHouse/pull/69674
  • https://github.com/ClickHouse/ClickHouse/pull/69741

libpqxx

  • https://github.com/ClickHouse/ClickHouse/pull/69580
  • (not merged) https://github.com/ClickHouse/ClickHouse/pull/69572

Above PRs were done quite mechanically: update the submodule in small version increments, re-apply (or drop) custom patches, check if tests fail. I am no PostgreSQL expert, I actually did not even setup the Postgres table engine by myself during this exercise. Obviously, none of the existing tests caught the issue (though, it was not proven yet that actually my PRs caused the problem.)

What would be helpful is a writeup of the reproduction steps, starting from a scratch ClickHouse and a scratch Postgres (again, I am PostgreSQL noob). Also, in case someone likes to bisect this: most PRs include smaller commits. Start at the bottom, revert them one-by-one and check if the problem persists.

EDIT: While writing this, I see https://github.com/nginx/unit/issues/834 which was referenced above. It says libpq 15.1 is to blame (which we get by https://github.com/ClickHouse/ClickHouse/pull/69674) and then the issue was closed. So does that mean that the behavior is expected? Is there anything to fix at all?

rschu1ze avatar Oct 14 '24 19:10 rschu1ze

Is there anything to fix at all?

I think Clickhouse should either:

  • make sure the process in the docker container has access to /root and/or /root/.postgresql
  • run the process in the docker container as a different user

mausch avatar Oct 14 '24 22:10 mausch

@mausch Is it right to assume that you are using the official ClickHouse Docker image from dockerhub?

I think this still needs a writeup of the exact steps to reproduce this error ...

rschu1ze avatar Oct 15 '24 06:10 rschu1ze

Is it right to assume that you are using the official ClickHouse Docker image from dockerhub?

Yep I'm using the default images managed by the Clickhouse operator.

I think this still needs a writeup of the exact steps to reproduce this error ...

Basically: run Clickhouse docker 24.9.2.42-alpine and attempt to connect to Postgres from it.

mausch avatar Oct 15 '24 07:10 mausch

Alternatively you can install ClickHouse on a fresh Ubuntu machine link, and connect to a fresh Postgres installation @rschu1ze . In my test I used Postgres 15.

Or if you have an existing machine, the error is introduced when updating from 24.6.* to 24.9.2.42 (we did this 3 days ago in our test environment, and the error message seems to be the same).

Let me know if you want me to try something out, or to test something.

hauthorn avatar Oct 17 '24 12:10 hauthorn

I am running 24.9.2.42 and can confirm the issue exists on this version. I have also attempted with 24.10.1.2812. I tried recreating the database and the error still exist.

jaitaiwan avatar Nov 03 '24 14:11 jaitaiwan

temp fix, add the following to env variable:

 env:
                - name: PGSSLCERT
                  value: /tmp/postgresql.crt

yannick avatar Nov 12 '24 08:11 yannick

I have similar and TBH i tried all options suggested in this thread. I have Postgres in RDS (AWS) and clickhouse 24.9.2.42 in docker container . I keep on getting this error: port 5432 failed: could not stat private key file "/root/.postgresql/postgresql.key": Permission denied and the query tried from clickhouse is this

insert into table select id, item_id, company_id, 1 as sign,toUnixTimestamp(modified_at) AS version from postgresql("host","database","table", "user","pass","schema", 'sslmode=verify-full;sslrootcert=/etc/clickhouse-client/ssl/postresql.crt')

NOTE: Tried without ssl mode in option as well

Does anyone know how we can fix this?

work around provided by yannick also did not work.

Raghavsalotra avatar Nov 29 '24 12:11 Raghavsalotra

I looked into this issue and found out that (besides all the problems with defaults) we don't provide optional connection parameters neither with mysql() nor postgresql() functions, thought libraries we are using (libpqxx for postgres and mariadb lib for mysql) have these options in their APIs...

yakov-olkhovskiy avatar Dec 01 '24 21:12 yakov-olkhovskiy

HI. This still is a problem in 25.1 both alpine and ubuntu.

What's the actual fix? Why does this happen?

timaelliott avatar Feb 05 '25 21:02 timaelliott

Is there any fix for this? This is a real issue. It works on v24.3.18, but not on v25.1.5 - This is verified.

INSERT INTO db.table SELECT *
FROM postgresql('host', 'database', 'table', 'postgres', 'password');

Error:

Received exception from server (version 25.1.5):
Code: 614. DB::Exception: Received from clickhouse-db-0.clickhouse-db.clickhouse.svc.cluster.local:9000. DB::Exception: Try 2. Connection to `host:5432` failed with error: connection to server at "host" (10.2.4.238), port 5432 failed: could not open certificate file "/root/.postgresql/postgresql.crt": Permission denied
connection to server at "host" (10.2.4.238), port 5432 failed: FATAL:  no pg_hba.conf entry for host "10.2.3.96", user "postgres", database "demo_prod", no encryption

. (POSTGRESQL_CONNECTION_FAILURE)

KaranJagtiani avatar Mar 04 '25 17:03 KaranJagtiani

Sorry, at the risk of repeating myself. This issue needs a minimally reproducible reproducer with a step-by-step guide how to run into the problem (Kubernetes and setting environment variables in shady places as workaround doesn't count).

rschu1ze avatar Mar 05 '25 09:03 rschu1ze

step-by-step guide how to run into the problem (Kubernetes and setting environment variables in shady places as workaround doesn't count).

I've already mentioned how to repro in https://github.com/ClickHouse/ClickHouse/issues/70365#issuecomment-2413122055 without Kubernetes or "setting environment variables in shady places" 🙂

To be super clear: setting environment variables is a workaround for the issue, not part of the repro.

mausch avatar Mar 05 '25 10:03 mausch

Running ClickHouse in kubernetes. Upgraded from 24.8-LTS to 25.3-LTS and faced this issue. The env var workaround solved it.

mxalis avatar Apr 11 '25 10:04 mxalis

Sorry, at the risk of repeating myself. This issue needs a minimally reproducible reproducer with a step-by-step guide how to run into the problem (Kubernetes and setting environment variables in shady places as workaround doesn't count).

What format would you like? I believe a few of us have attempted to share how we reproduce the issue. Let me know if I can be more concrete or help you in some way @rschu1ze.

hauthorn avatar Apr 15 '25 08:04 hauthorn

Sorry, I don't find the time to work on this in soon future.

rschu1ze avatar Apr 15 '25 08:04 rschu1ze

As soon as a workaround exists, and it is an upstream library issue, let's close.

alexey-milovidov avatar Jun 15 '25 20:06 alexey-milovidov

I'm surprised by the response from the co-founder/CTO.

This issue stems from ClickHouse upgrading libpq without adequately testing for breaking changes. Whether the breaking change is documented or a bug in libpq is ultimately irrelevant. ClickHouse users interface with ClickHouse, not its dependencies. They expect it to work reliably.

While a workaround exists, the problem prevents users from running ClickHouse in Docker or Kubernetes environments and connecting to Postgres, arguably one of its most valuable integrations, unless they search and find this issue and bother to apply the workaround.

One of the main selling points of Clickhouse for us and likely many other companies is its ease of integration with Postgres and other data sources. Being able to run it locally is critical for prototyping and validating a proof of concept before eventually migrating to ClickHouse Cloud for production workloads. Leaving this issue unresolved risks losing potential customers, not just because of the technical problem, but also due to the perceived lack of support.

mausch avatar Jun 15 '25 22:06 mausch

Thanks, adding "help wanted".

alexey-milovidov avatar Jun 15 '25 23:06 alexey-milovidov

We were using an AWS Aurora Postgres DB with ClickHouse and came across this error trying to create a DB using the MaterializedPostgreSQL engine. Apparently this error can manifest in the case of insufficient permissions - we used the reader endpoint instead of the writer endpoint, and this engine needs to be able to create a replication slot. Using the writer endpoint solved the issue.

Just posting in case anyone else finds themselves here with a similar configuration.

benmurden avatar Sep 24 '25 06:09 benmurden