cp-docker-images icon indicating copy to clipboard operation
cp-docker-images copied to clipboard

New Connect healthcheck fails with http listener and CONNECT_SSL_ENDPOINT_IDENTIFICATION_ALGORITHM=https

Open ybyzek opened this issue 4 years ago • 4 comments

Running cp-all-in-one-cloud results in a false negative -- it shows unhealthy state for the connect image even though it is working fine.

connect           /etc/confluent/docker/run   Up (unhealthy)   0.0.0.0:8083->8083/tcp, 9092/tcp
docker inspect --format='{{json .State.Health}}' connect
{"Status":"starting","FailingStreak":74,"Log":[{"Start":"2019-09-12T11:16:32.1104066Z","End":"2019-09-12T11:16:32.215473Z","ExitCode":1,"Output":"Thu Sep 12 11:16:32 UTC 2019 \tKafka Connect with SSL listener HTTP state:  200  (waiting for 200)\n"},{"Start":"2019-09-12T11:16:37.2258327Z","End":"2019-09-12T11:16:37.3358418Z","ExitCode":1,"Output":"Thu Sep 12 11:16:37 UTC 2019 \tKafka Connect with SSL listener HTTP state:  200  (waiting for 200)\n"},{"Start":"2019-09-12T11:16:42.3429174Z","End":"2019-09-12T11:16:42.4463996Z","ExitCode":1,"Output":"Thu Sep 12 11:16:42 UTC 2019 \tKafka Connect with SSL listener HTTP state:  200  (waiting for 200)\n"},{"Start":"2019-09-12T11:16:47.4539929Z","End":"2019-09-12T11:16:47.5656872Z","ExitCode":1,"Output":"Thu Sep 12 11:16:47 UTC 2019 \tKafka Connect with SSL listener HTTP state:  200  (waiting for 200)\n"},{"Start":"2019-09-12T11:16:52.5760447Z","End":"2019-09-12T11:16:52.6728641Z","ExitCode":1,"Output":"Thu Sep 12 11:16:52 UTC 2019 \tKafka Connect with SSL listener HTTP state:  200  (waiting for 200)\n"}]}

The issue is this line: https://github.com/confluentinc/cp-docker-images/blob/5.3.1-post/debian/kafka-connect-base/include/etc/confluent/docker/healthcheck.sh#L3

CONNECT_SSL_ENDPOINT_IDENTIFICATION_ALGORITHM is defined (because this is Connect to Confluent Cloud) but the listener is http and not https

This is a result of https://github.com/confluentinc/cp-docker-images/pull/769/ (cc: @rmoff )

ybyzek avatar Sep 12 '19 11:09 ybyzek

I picked the wrong env var to try and determine if Connect's REST endpoint was running HTTPS. CONNECT_SSL_ENDPOINT_IDENTIFICATION_ALGORITHM is actually nothing to do with it.

From what I can tell based on https://docs.confluent.io/current/connect/security.html#configuring-the-kconnect-rest-api-for-http-or-https then simply configuring listeners (therefore, CONNECT_LISTENERS) you can define https.

Need to change the script to take CONNECT_LISTENERS, split it on a comma if exists and use the first entry, based on the example from the above doc listeners=http://localhost:8080,https://localhost:8443

rmoff avatar Sep 25 '19 14:09 rmoff

@chuck-confluent and I had a similar problem this week. The healthcheck.sh returns an unhealthy status when the CONNECT_REST_PORT env variable isn't set. We have docker-compose files that just go with the default.

I see CONNECT_REST_PORT is given a default value here: https://github.com/confluentinc/cp-docker-images/blob/5.3.1-post/debian/kafka-connect-base/include/etc/confluent/docker/configure#L32. But this isn't in scope where the healthcheck runs.

I'm not sure what the fix would be. Should the healthcheck be implementing the same defaults that connect does, i.e. ${CONNECT_REST_PORT:-8083}

russau avatar Nov 07 '19 18:11 russau

Should the healthcheck be implementing the same defaults that connect does, i.e. ${CONNECT_REST_PORT:-8083}

Yes, that sounds sensible

rmoff avatar Nov 11 '19 10:11 rmoff

Any update on this ? Is a fix foreseen to be included in a future release ? For now, we are patching the healthcheck ourselves since we use CONNECT_SSL_ENDPOINT_IDENTIFICATION_ALGORITHM but don't have ssl ports open on the connect container.

gvdmarck avatar Dec 10 '19 16:12 gvdmarck