posthog
posthog copied to clipboard
feat: ensure no low cardinality snapshot source in any deployment
I initially created the snapshot source column as low cardinality nullable string, since we know it can be null or one of a small number of values... ClickHouse really didn't like that.
We've cleared this up on CH cloud, but there could be hobby deployments, dev deployments that have low cardinality on some of them and, since we're not using the values in the column, it should be safe to drop and recreate the column so we know it has no LowCardinality references anywhere
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale
label β otherwise this will be closed in another week.
pinging @fuziontech (hopefully not literally at 9am UTC π€£)
maybe @benjackwhite has time to test this too
SELECT
type ilike '%LowCardinality%' AS has_low_cardinality,
type AS column_type,
name AS column_name,
table
FROM system.columns
WHERE
database = 'default' AND
lower(name) = 'snapshot_source'
ORDER BY table, name;
running that should print some columns with LowCardinality still in the definition.
Then run the migration in this branch
Everything should still work and re-running the above query should show no columns with low cardinality in the definition
maybe @benjackwhite has time to test this too
SELECT type ilike '%LowCardinality%' AS has_low_cardinality, type AS column_type, name AS column_name, table FROM system.columns WHERE database = 'default' AND lower(name) = 'snapshot_source' ORDER BY table, name;
running that should print some columns with LowCardinality still in the definition.
Then run the migration in this branch
Everything should still work and re-running the above query should show no columns with low cardinality in the definition
This all worked as you say except that the migrate command timed out the first time (on the 5th command) and the second time on the 1st command...
And now my migrations are completely stuck...
@daibhin we nuked Ben's environment and the migration then worked... that's not great if this breaks a self-hosted deploy and the only advice is to delete everything π
can you test locally and see what happens for you?
Running the first time
ββhas_low_cardinalityββ¬βcolumn_typeββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ¬βcolumn_nameββββββ¬βtableβββββββββββββββββββββββββββ
β 1 β LowCardinality(Nullable(String)) β snapshot_source β kafka_session_replay_events β
β 1 β AggregateFunction(argMin, LowCardinality(Nullable(String)), DateTime64(6, 'UTC')) β snapshot_source β session_replay_events β
β 1 β AggregateFunction(argMin, LowCardinality(Nullable(String)), DateTime64(6, 'UTC')) β snapshot_source β session_replay_events_mv β
β 1 β AggregateFunction(argMin, LowCardinality(Nullable(String)), DateTime64(6, 'UTC')) β snapshot_source β sharded_session_replay_events β
β 1 β AggregateFunction(argMin, LowCardinality(Nullable(String)), DateTime64(6, 'UTC')) β snapshot_source β writable_session_replay_events β
βββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ΄ββββββββββββββββββ΄βββββββββββββββββββββββββββββββββ
5 rows in set. Elapsed: 0.005 sec.
Ran the migration - all went well
2024-02-01T12:50:21.728088Z [info ] Applying migration 0052_session_replay_source_ensure_high_cardinality... [migrations] pid=11665 tid=8016367616
2024-02-01T12:50:21.728171Z [info ] Executing python operation run_sql [migrations] pid=11665 tid=8016367616
2024-02-01T12:50:21.850276Z [info ] Executing python operation run_sql [migrations] pid=11665 tid=8016367616
2024-02-01T12:50:25.141114Z [info ] Executing python operation run_sql [migrations] pid=11665 tid=8016367616
2024-02-01T12:50:25.257037Z [info ] Executing python operation run_sql [migrations] pid=11665 tid=8016367616
2024-02-01T12:50:25.380275Z [info ] Executing python operation run_sql [migrations] pid=11665 tid=8016367616
2024-02-01T12:50:25.500814Z [info ] Executing python operation run_sql [migrations] pid=11665 tid=8016367616
2024-02-01T12:50:25.609655Z [info ] Executing python operation run_sql [migrations] pid=11665 tid=8016367616
2024-02-01T12:50:25.722659Z [info ] Executing python operation run_sql [migrations] pid=11665 tid=8016367616
2024-02-01T12:50:25.841519Z [info ] Executing python operation run_sql [migrations] pid=11665 tid=8016367616
2024-02-01T12:50:25.965506Z [info ] Executing python operation run_sql [migrations] pid=11665 tid=8016367616
β
Migration successful
Looks like the columns are gone
ββhas_low_cardinalityββ¬βcolumn_typeββββββββββββββββββββββββββββββββββββββββββββββββββββββββ¬βcolumn_nameββββββ¬βtableβββββββββββββββββββββββββββ
β 0 β Nullable(String) β snapshot_source β kafka_session_replay_events β
β 0 β AggregateFunction(argMin, Nullable(String), DateTime64(6, 'UTC')) β snapshot_source β session_replay_events β
β 0 β AggregateFunction(argMin, Nullable(String), DateTime64(6, 'UTC')) β snapshot_source β session_replay_events_mv β
β 0 β AggregateFunction(argMin, Nullable(String), DateTime64(6, 'UTC')) β snapshot_source β sharded_session_replay_events β
β 0 β AggregateFunction(argMin, Nullable(String), DateTime64(6, 'UTC')) β snapshot_source β writable_session_replay_events β
βββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ΄ββββββββββββββββββ΄βββββββββββββββββββββββββββββββββ
5 rows in set. Elapsed: 0.004 sec.
Random question @pauldambra: I queried CH by running clickhouse client
on Exec tab of the container in Docker UI. Is there a better way to connect to a CH shell locally?
(btw the training is paying off dividends because I would not have been able to do that yesterday)
I queried CH by running clickhouse client on Exec tab of the container in Docker UI. Is there a better way to connect to a CH shell locally?
I've seen people using "table plus" I use pycharm which has a DB explorer built in JetBrains also have a DB thingy called DataGrip iirc
I'm not sure what else people use
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale
label β otherwise this will be closed in another week.
This PR was closed due to lack of activity. Feel free to reopen if it's still relevant.