self-hosted
self-hosted copied to clipboard
Upgrade ClickHouse to 21.10.2.15-stable or later
Problem Statement
7 RCE and DoS vulnerabilities were disclosed in ClickHouse DBMS recently. More details here: https://jfrog.com/blog/7-rce-and-dos-vulnerabilities-found-in-clickhouse-dbms/
Fix
In order to fix the issues, update ClickHouse to the v21.10.2.15-stable version or later.
If upgrading is not possible, add firewall rules in the server that will restrict the access to the web port (8123) and the TCP server’s port (9000) to specific clients only.
Solution Brainstorm
ClickHouse version needs to be bumped up here https://github.com/getsentry/self-hosted/blob/master/docker-compose.yml#L189
Seems like an easy PR @0xr1. 😉
Not sure how strict we need/want to be with ClickHouse version in SaaS vs. self-hosted, tho ... cf. #1097
restrict the access to the web port (8123) and the TCP server’s port (9000) to specific clients only
Are these locked down in any way in a stock self-hosted install? I think not but maybe, or maybe there is something we can do to lock them down if upgrading ClickHouse isn't as easy as it seems.
I asked around internally about this and did some digging:
- The version we're pinned to in
self-hosted
tracks the minimum version we're pinned to in Snuba:20.3.9.70
. - In dev envs we switch to
altinity/clickhouse-server:21.6.1.6734-testing-arm
under ARM, because there are no stable builds supporting ARM. - Altinity is a third-party ClickHouse hosting company (TIL). We are migrating to their Altinity Stable Build, which tracks upstream ClickHouse, sometimes adding critical fixes. The main benefit is that these are known to have been operated by Altinity for their cloud offering, and they include docs on how to upgrade safely.
- We’re trying to move everything from stock 20.3 to Altinity 21.8. We have an initial cluster stood up, but it's not yet serving errors/transactions in SaaS. There are unresolved issues (internal doc) with versions > 20.3.9.70, so it's a no-go for
self-hosted
at this point. - There are multiple breaking changes between 20.3 and 21.8, it is not a simple drop-in migration even if we were already on Altinity 20.3 (vs. stock).
- The current path we are choosing with upgrading errors/transactions in SaaS is not the same as the one self-hosted customers will do. We are currently doing dual writes to avoid landing in a situation where we don’t know how to recover. We will eventually get to discussing about how to upgrade in place which would be applicable for self-hosted.
tl;dr It seems we're a ways off from moving past 20.3.
Is there an Altinity Stable 20.3 that has security fixes and would work for self-hosted?
It sounds like SaaS has moved to 21.8, so the hard part of the work is out of the way. For ARM I think we can use a stable image altinity/clickhouse-server:21.8.12.29.altinitydev.arm
, and whatever version of clickhouse SaaS uses for x86. It sounds like the upgrade is not straightforward, so we may need to make a hard stop soon (which could also be useful for https://github.com/getsentry/self-hosted/pull/1703).
As far as I can tell, since 21.8 is an LTS, it got the security patches for the above mentioned CVEs
Update
- Prod is on 21.8, it’s difficult for Sentry devs on M1 macs to test features on CH 20.3 for compatibility
- Some changes in the future might be blocking on CH
- Replays broke for CH 20.3, what’s to say that won’t happen again in the future?
- Feels like it’s a very low priority item for SNS to internally validate if ingested data on 20.3 is compatible with 21.8, don’t know if they will ever get to it (slack thread🔒)
- Clickhouse vulnerabilities in current version (CVEs)
Notes from talking with SnS team:
- We should be able to go directly onto 21.8 after shutting down the single node cluster of clickhouse, so this can be baked into the install script
- No new configurations for clickhouse 21.8
Workflow to get this done:
- Put up a PR in self-hosted to upgrade the clickhouse images
- Test it on https://self-hosted.getsentry.net
- Put up a public notice in the develop docs to notify people of this upgrade, especially the folks using their own Clickhouse setup.
- Merge in PR and release
- Mention in release notes
Going to attempt to perform this upgrade in prod after backing up clickhouse containers. Using the steps outlined here
Not yet updating to >21.10.2.15, but making progress!
https://github.com/getsentry/self-hosted/pull/2536
New request at #2741
Commenting here that the newest clickhouse versions have ARM images, which would be great for us!
Hello,
I was wondering what is blocking upgrades to v22 or v23 of ClickHouse ? https://hub.docker.com/r/altinity/clickhouse-server/tags?page=1&name=22 https://hub.docker.com/r/altinity/clickhouse-server/tags?page=1&name=23
As far as I understand we all are using a two year old version, what are the impacts of upgrading ? Where can I find the code samples that interact with CH ?
As far as I understand we all are using a two year old version, what are the impacts of upgrading ?
ClickHouse has 2 version: LTS and stable. On my company, I use the stable once, since I'm to lazy to handle big breaking change once the LTS version release. But since 2021 or so, I haven't met any breaking change that broke my app with Cilckhouse.
This is their changelog: https://github.com/ClickHouse/ClickHouse/blob/master/CHANGELOG.md
Most of the time if there are any "backward incompatible changes", the existing query will be fine, but it won't do anything.
Where can I find the code samples that interact with CH ?
I found some here: https://github.com/getsentry/snuba/blob/master/snuba/replacers/errors_replacer.py, it is executed from here https://github.com/getsentry/snuba/blob/338ae983506f787852c07d16e13a544bb64c5055/snuba/replacer.py#L348-L397
And the Rust version: https://github.com/getsentry/snuba/blob/359878fbe030a63945914ef05e705224680b453c/rust_snuba/src/strategies/clickhouse.rs#L61
ClickHouse has 2 version: LTS and stable. On my company, I use the stable once, since I'm to lazy to handle big breaking change once the LTS version release. But since 2021 or so, I haven't met any breaking change that broke my app with Cilckhouse.
Can you confirm the exact version that is working with Sentry on your setup ? Maybe I can also bump and confirm that is works fine too. It may end up in a bump for self hosted.
ClickHouse has 2 version: LTS and stable. On my company, I use the stable once, since I'm to lazy to handle big breaking change once the LTS version release. But since 2021 or so, I haven't met any breaking change that broke my app with Cilckhouse.
Can you confirm the exact version that is working with Sentry on your setup ? Maybe I can also bump and confirm that is works fine too. It may end up in a bump for self hosted.
I'm using the default value on the repo right now. One thing that's different on my deployment is just I switched from Kafka to Redpanda.
I'm using the default value on the repo right now. One thing that's different on my deployment is just I switched from Kafka to Redpanda.
Very cool ! Could you open a PR to share the implementation details ?
I'm using the default value on the repo right now. One thing that's different on my deployment is just I switched from Kafka to Redpanda.
Very cool ! Could you open a PR to share the implementation details ?
It's on Sentry's Discord: https://discord.com/channels/621778831602221064/796028405833007104/1201076383426809948