self-hosted icon indicating copy to clipboard operation
self-hosted copied to clipboard

Upgrade ClickHouse to 21.10.2.15-stable or later

Open 0xr1 opened this issue 2 years ago • 19 comments

Problem Statement

7 RCE and DoS vulnerabilities were disclosed in ClickHouse DBMS recently. More details here: https://jfrog.com/blog/7-rce-and-dos-vulnerabilities-found-in-clickhouse-dbms/

Fix

In order to fix the issues, update ClickHouse to the v21.10.2.15-stable version or later.

If upgrading is not possible, add firewall rules in the server that will restrict the access to the web port (8123) and the TCP server’s port (9000) to specific clients only.

Solution Brainstorm

ClickHouse version needs to be bumped up here https://github.com/getsentry/self-hosted/blob/master/docker-compose.yml#L189

0xr1 avatar Mar 16 '22 15:03 0xr1

Seems like an easy PR @0xr1. 😉

Not sure how strict we need/want to be with ClickHouse version in SaaS vs. self-hosted, tho ... cf. #1097

restrict the access to the web port (8123) and the TCP server’s port (9000) to specific clients only

Are these locked down in any way in a stock self-hosted install? I think not but maybe, or maybe there is something we can do to lock them down if upgrading ClickHouse isn't as easy as it seems.

chadwhitacre avatar Mar 16 '22 16:03 chadwhitacre

I asked around internally about this and did some digging:

  1. The version we're pinned to in self-hosted tracks the minimum version we're pinned to in Snuba: 20.3.9.70.
  2. In dev envs we switch to altinity/clickhouse-server:21.6.1.6734-testing-arm under ARM, because there are no stable builds supporting ARM.
  3. Altinity is a third-party ClickHouse hosting company (TIL). We are migrating to their Altinity Stable Build, which tracks upstream ClickHouse, sometimes adding critical fixes. The main benefit is that these are known to have been operated by Altinity for their cloud offering, and they include docs on how to upgrade safely.
  4. We’re trying to move everything from stock 20.3 to Altinity 21.8. We have an initial cluster stood up, but it's not yet serving errors/transactions in SaaS. There are unresolved issues (internal doc) with versions > 20.3.9.70, so it's a no-go for self-hosted at this point.
  5. There are multiple breaking changes between 20.3 and 21.8, it is not a simple drop-in migration even if we were already on Altinity 20.3 (vs. stock).
  6. The current path we are choosing with upgrading errors/transactions in SaaS is not the same as the one self-hosted customers will do. We are currently doing dual writes to avoid landing in a situation where we don’t know how to recover. We will eventually get to discussing about how to upgrade in place which would be applicable for self-hosted.

tl;dr It seems we're a ways off from moving past 20.3.

chadwhitacre avatar Apr 18 '22 22:04 chadwhitacre

Is there an Altinity Stable 20.3 that has security fixes and would work for self-hosted?

chadwhitacre avatar Apr 18 '22 22:04 chadwhitacre

It sounds like SaaS has moved to 21.8, so the hard part of the work is out of the way. For ARM I think we can use a stable image altinity/clickhouse-server:21.8.12.29.altinitydev.arm, and whatever version of clickhouse SaaS uses for x86. It sounds like the upgrade is not straightforward, so we may need to make a hard stop soon (which could also be useful for https://github.com/getsentry/self-hosted/pull/1703).

As far as I can tell, since 21.8 is an LTS, it got the security patches for the above mentioned CVEs

emmatyping avatar Oct 14 '22 17:10 emmatyping

Update

  1. Prod is on 21.8, it’s difficult for Sentry devs on M1 macs to test features on CH 20.3 for compatibility
  2. Some changes in the future might be blocking on CH
  3. Replays broke for CH 20.3, what’s to say that won’t happen again in the future?
  4. Feels like it’s a very low priority item for SNS to internally validate if ingested data on 20.3 is compatible with 21.8, don’t know if they will ever get to it (slack thread🔒)
  5. Clickhouse vulnerabilities in current version (CVEs)

chadwhitacre avatar Mar 21 '23 21:03 chadwhitacre

Notes from talking with SnS team:

  • We should be able to go directly onto 21.8 after shutting down the single node cluster of clickhouse, so this can be baked into the install script
  • No new configurations for clickhouse 21.8

Workflow to get this done:

  1. Put up a PR in self-hosted to upgrade the clickhouse images
  2. Test it on https://self-hosted.getsentry.net
  3. Put up a public notice in the develop docs to notify people of this upgrade, especially the folks using their own Clickhouse setup.
  4. Merge in PR and release
  5. Mention in release notes

hubertdeng123 avatar Oct 24 '23 21:10 hubertdeng123

Going to attempt to perform this upgrade in prod after backing up clickhouse containers. Using the steps outlined here

hubertdeng123 avatar Nov 01 '23 22:11 hubertdeng123

Not yet updating to >21.10.2.15, but making progress!

https://github.com/getsentry/self-hosted/pull/2536

hubertdeng123 avatar Nov 02 '23 19:11 hubertdeng123

New request at #2741

williamdes avatar Feb 21 '24 21:02 williamdes

Commenting here that the newest clickhouse versions have ARM images, which would be great for us!

hubertdeng123 avatar Feb 22 '24 18:02 hubertdeng123

Hello,

I was wondering what is blocking upgrades to v22 or v23 of ClickHouse ? https://hub.docker.com/r/altinity/clickhouse-server/tags?page=1&name=22 https://hub.docker.com/r/altinity/clickhouse-server/tags?page=1&name=23

As far as I understand we all are using a two year old version, what are the impacts of upgrading ? Where can I find the code samples that interact with CH ?

williamdes avatar Mar 16 '24 10:03 williamdes

As far as I understand we all are using a two year old version, what are the impacts of upgrading ?

ClickHouse has 2 version: LTS and stable. On my company, I use the stable once, since I'm to lazy to handle big breaking change once the LTS version release. But since 2021 or so, I haven't met any breaking change that broke my app with Cilckhouse.

This is their changelog: https://github.com/ClickHouse/ClickHouse/blob/master/CHANGELOG.md

Most of the time if there are any "backward incompatible changes", the existing query will be fine, but it won't do anything.

Where can I find the code samples that interact with CH ?

I found some here: https://github.com/getsentry/snuba/blob/master/snuba/replacers/errors_replacer.py, it is executed from here https://github.com/getsentry/snuba/blob/338ae983506f787852c07d16e13a544bb64c5055/snuba/replacer.py#L348-L397

aldy505 avatar Mar 19 '24 08:03 aldy505

And the Rust version: https://github.com/getsentry/snuba/blob/359878fbe030a63945914ef05e705224680b453c/rust_snuba/src/strategies/clickhouse.rs#L61

williamdes avatar Mar 19 '24 09:03 williamdes

ClickHouse has 2 version: LTS and stable. On my company, I use the stable once, since I'm to lazy to handle big breaking change once the LTS version release. But since 2021 or so, I haven't met any breaking change that broke my app with Cilckhouse.

Can you confirm the exact version that is working with Sentry on your setup ? Maybe I can also bump and confirm that is works fine too. It may end up in a bump for self hosted.

williamdes avatar Mar 19 '24 09:03 williamdes

ClickHouse has 2 version: LTS and stable. On my company, I use the stable once, since I'm to lazy to handle big breaking change once the LTS version release. But since 2021 or so, I haven't met any breaking change that broke my app with Cilckhouse.

Can you confirm the exact version that is working with Sentry on your setup ? Maybe I can also bump and confirm that is works fine too. It may end up in a bump for self hosted.

I'm using the default value on the repo right now. One thing that's different on my deployment is just I switched from Kafka to Redpanda.

aldy505 avatar Mar 19 '24 10:03 aldy505

I'm using the default value on the repo right now. One thing that's different on my deployment is just I switched from Kafka to Redpanda.

Very cool ! Could you open a PR to share the implementation details ?

williamdes avatar Mar 19 '24 12:03 williamdes

I'm using the default value on the repo right now. One thing that's different on my deployment is just I switched from Kafka to Redpanda.

Very cool ! Could you open a PR to share the implementation details ?

It's on Sentry's Discord: https://discord.com/channels/621778831602221064/796028405833007104/1201076383426809948

aldy505 avatar Mar 19 '24 12:03 aldy505