posthog icon indicating copy to clipboard operation
posthog copied to clipboard

feat: ensure no low cardinality snapshot source in any deployment

Open pauldambra opened this issue 1 year ago β€’ 9 comments

I initially created the snapshot source column as low cardinality nullable string, since we know it can be null or one of a small number of values... ClickHouse really didn't like that.

We've cleared this up on CH cloud, but there could be hobby deployments, dev deployments that have low cardinality on some of them and, since we're not using the values in the column, it should be safe to drop and recreate the column so we know it has no LowCardinality references anywhere

pauldambra avatar Jan 24 '24 22:01 pauldambra

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.

posthog-bot avatar Feb 01 '24 07:02 posthog-bot

pinging @fuziontech (hopefully not literally at 9am UTC 🀣)

pauldambra avatar Feb 01 '24 09:02 pauldambra

maybe @benjackwhite has time to test this too

SELECT
    type ilike '%LowCardinality%' AS has_low_cardinality,
    type AS column_type,
    name AS column_name,
    table
FROM system.columns
WHERE
    database = 'default' AND
    lower(name) = 'snapshot_source'
ORDER BY table, name;

running that should print some columns with LowCardinality still in the definition.

Then run the migration in this branch

Everything should still work and re-running the above query should show no columns with low cardinality in the definition

pauldambra avatar Feb 01 '24 09:02 pauldambra

maybe @benjackwhite has time to test this too

SELECT
    type ilike '%LowCardinality%' AS has_low_cardinality,
    type AS column_type,
    name AS column_name,
    table
FROM system.columns
WHERE
    database = 'default' AND
    lower(name) = 'snapshot_source'
ORDER BY table, name;

running that should print some columns with LowCardinality still in the definition.

Then run the migration in this branch

Everything should still work and re-running the above query should show no columns with low cardinality in the definition

This all worked as you say except that the migrate command timed out the first time (on the 5th command) and the second time on the 1st command...

And now my migrations are completely stuck...

benjackwhite avatar Feb 01 '24 09:02 benjackwhite

@daibhin we nuked Ben's environment and the migration then worked... that's not great if this breaks a self-hosted deploy and the only advice is to delete everything πŸ™ˆ

can you test locally and see what happens for you?

pauldambra avatar Feb 01 '24 12:02 pauldambra

Running the first time

β”Œβ”€has_low_cardinality─┬─column_type───────────────────────────────────────────────────────────────────────┬─column_name─────┬─table──────────────────────────┐
β”‚                   1 β”‚ LowCardinality(Nullable(String))                                                  β”‚ snapshot_source β”‚ kafka_session_replay_events    β”‚
β”‚                   1 β”‚ AggregateFunction(argMin, LowCardinality(Nullable(String)), DateTime64(6, 'UTC')) β”‚ snapshot_source β”‚ session_replay_events          β”‚
β”‚                   1 β”‚ AggregateFunction(argMin, LowCardinality(Nullable(String)), DateTime64(6, 'UTC')) β”‚ snapshot_source β”‚ session_replay_events_mv       β”‚
β”‚                   1 β”‚ AggregateFunction(argMin, LowCardinality(Nullable(String)), DateTime64(6, 'UTC')) β”‚ snapshot_source β”‚ sharded_session_replay_events  β”‚
β”‚                   1 β”‚ AggregateFunction(argMin, LowCardinality(Nullable(String)), DateTime64(6, 'UTC')) β”‚ snapshot_source β”‚ writable_session_replay_events β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

5 rows in set. Elapsed: 0.005 sec.

Ran the migration - all went well

2024-02-01T12:50:21.728088Z [info     ] Applying migration 0052_session_replay_source_ensure_high_cardinality... [migrations] pid=11665 tid=8016367616
2024-02-01T12:50:21.728171Z [info     ]     Executing python operation run_sql [migrations] pid=11665 tid=8016367616
2024-02-01T12:50:21.850276Z [info     ]     Executing python operation run_sql [migrations] pid=11665 tid=8016367616
2024-02-01T12:50:25.141114Z [info     ]     Executing python operation run_sql [migrations] pid=11665 tid=8016367616
2024-02-01T12:50:25.257037Z [info     ]     Executing python operation run_sql [migrations] pid=11665 tid=8016367616
2024-02-01T12:50:25.380275Z [info     ]     Executing python operation run_sql [migrations] pid=11665 tid=8016367616
2024-02-01T12:50:25.500814Z [info     ]     Executing python operation run_sql [migrations] pid=11665 tid=8016367616
2024-02-01T12:50:25.609655Z [info     ]     Executing python operation run_sql [migrations] pid=11665 tid=8016367616
2024-02-01T12:50:25.722659Z [info     ]     Executing python operation run_sql [migrations] pid=11665 tid=8016367616
2024-02-01T12:50:25.841519Z [info     ]     Executing python operation run_sql [migrations] pid=11665 tid=8016367616
2024-02-01T12:50:25.965506Z [info     ]     Executing python operation run_sql [migrations] pid=11665 tid=8016367616
βœ… Migration successful

Looks like the columns are gone

β”Œβ”€has_low_cardinality─┬─column_type───────────────────────────────────────────────────────┬─column_name─────┬─table──────────────────────────┐
β”‚                   0 β”‚ Nullable(String)                                                  β”‚ snapshot_source β”‚ kafka_session_replay_events    β”‚
β”‚                   0 β”‚ AggregateFunction(argMin, Nullable(String), DateTime64(6, 'UTC')) β”‚ snapshot_source β”‚ session_replay_events          β”‚
β”‚                   0 β”‚ AggregateFunction(argMin, Nullable(String), DateTime64(6, 'UTC')) β”‚ snapshot_source β”‚ session_replay_events_mv       β”‚
β”‚                   0 β”‚ AggregateFunction(argMin, Nullable(String), DateTime64(6, 'UTC')) β”‚ snapshot_source β”‚ sharded_session_replay_events  β”‚
β”‚                   0 β”‚ AggregateFunction(argMin, Nullable(String), DateTime64(6, 'UTC')) β”‚ snapshot_source β”‚ writable_session_replay_events β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

5 rows in set. Elapsed: 0.004 sec.

daibhin avatar Feb 01 '24 12:02 daibhin

Random question @pauldambra: I queried CH by running clickhouse client on Exec tab of the container in Docker UI. Is there a better way to connect to a CH shell locally?

(btw the training is paying off dividends because I would not have been able to do that yesterday)

daibhin avatar Feb 01 '24 12:02 daibhin

I queried CH by running clickhouse client on Exec tab of the container in Docker UI. Is there a better way to connect to a CH shell locally?

I've seen people using "table plus" I use pycharm which has a DB explorer built in JetBrains also have a DB thingy called DataGrip iirc

I'm not sure what else people use

pauldambra avatar Feb 01 '24 12:02 pauldambra

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.

posthog-bot avatar Feb 16 '24 07:02 posthog-bot

This PR was closed due to lack of activity. Feel free to reopen if it's still relevant.

posthog-bot avatar Feb 23 '24 07:02 posthog-bot