posthog icon indicating copy to clipboard operation
posthog copied to clipboard

0030_created_at_persons_and_groups_on_events.py wants to add column to "write_events" but this table does not exist when CLICKHOUSE_REPLICATE=false

Open allan-simon opened this issue 3 years ago • 7 comments

Bug description

  1. I'm trying to start a fresh setup with 2 machines , 1 containing the datastores (clickhouse+kafka) , 1 containing the code

How to reproduce

  1. use the following docker-compose.yml
version: "3"

services:
    redis:
      # pinned to version 6 as the latest one is 7+, not supported by the Posthog pinned version below
      image: "redis:6-alpine"
      container_name: posthog_redis
      restart: always
    web: &web
        image: posthog/posthog:latest-release
        restart: on-failure
        environment:
            SKIP_SERVICE_VERSION_REQUIREMENTS: 'true'
            DISABLE_SECURE_SSL_REDIRECT: 'true'
            DATABASE_URL: "URL_TO_YOUR_POSGRES"
            CLICKHOUSE_HOST: 'IP_OF_CLICKHOUSE'
            CLICKHOUSE_SECURE: 'false'
            CLICKHOUSE_VERIFY: 'false'
            CLICKHOUSE_REPLICATION: 'false'
            KAFKA_URL: 'kafka://IP_OF_KAFKA'
            REDIS_URL: 'redis://redis:6379/'
            SECRET_KEY: "somethingsomething"
            SITE_URL: "https://something.example.com"
            DEPLOYMENT: hello_i_am_trapped_in_an_environment_variable
        depends_on:
            - redis
    worker:
        <<: *web
        image: posthog/posthog:latest-release
        command: ./bin/docker-worker-celery --with-scheduler
    plugins:
        image: posthog/posthog:latest-release
        command: ./bin/plugin-server --no-restart-loop
        restart: on-failure
        environment:
            SKIP_SERVICE_VERSION_REQUIREMENTS: 'true'
            DISABLE_SECURE_SSL_REDIRECT: 'true'
            SECRET_KEY: "somethingsomething"
            DATABASE_URL: "URL"
            KAFKA_HOSTS: '10.40.0.4:9092'
            REDIS_URL: 'redis://redis:6379/'
            CLICKHOUSE_HOST: 'IP_OF_CLICKHOUSE'
            CLICKHOUSE_SECURE: 'false'
            CLICKHOUSE_VERIFY: 'false'
            CLICKHOUSE_REPLICATION: 'false'
        depends_on:
            - redis

    asyncmigrationscheck:
        <<: *web
        command: python manage.py run_async_migrations --check
        restart: on-failure

  1. docker-compose up

web will then display

web_1                   | Traceback (most recent call last):
web_1                   |   File "manage.py", line 21, in <module>
web_1                   |     main()
web_1                   |   File "manage.py", line 17, in main
web_1                   |     execute_from_command_line(sys.argv)
web_1                   |   File "/usr/local/lib/python3.8/site-packages/django/core/management/__init__.py", line 419, in execute_from_command_line
web_1                   |     utility.execute()
web_1                   |   File "/usr/local/lib/python3.8/site-packages/django/core/management/__init__.py", line 413, in execute
web_1                   |     self.fetch_command(subcommand).run_from_argv(self.argv)
web_1                   |   File "/usr/local/lib/python3.8/site-packages/django/core/management/base.py", line 354, in run_from_argv
web_1                   |     self.execute(*args, **cmd_options)
web_1                   |   File "/usr/local/lib/python3.8/site-packages/django/core/management/base.py", line 398, in execute
web_1                   |     output = self.handle(*args, **options)
web_1                   |   File "/code/posthog/management/commands/migrate_clickhouse.py", line 42, in handle
web_1                   |     self.migrate(CLICKHOUSE_HTTP_URL, options)
web_1                   |   File "/code/posthog/management/commands/migrate_clickhouse.py", line 80, in migrate
web_1                   |     database.migrate(MIGRATIONS_PACKAGE_NAME, options["upto"], replicated=CLICKHOUSE_REPLICATION)
web_1                   |   File "/usr/local/lib/python3.8/site-packages/infi/clickhouse_orm/database.py", line 352, in migrate
web_1                   |     operation.apply(self)
web_1                   |   File "/usr/local/lib/python3.8/site-packages/infi/clickhouse_orm/migrations.py", line 250, in apply
web_1                   |     self._func(database)
web_1                   |   File "/code/posthog/clickhouse/migrations/0030_created_at_persons_and_groups_on_events.py", line 21, in add_columns_to_required_tables
web_1                   |     sync_execute(ADD_COLUMNS_BASE_SQL.format(table="writable_events", cluster=CLICKHOUSE_CLUSTER))
web_1                   |   File "/code/posthog/client.py", line 161, in sync_execute
web_1                   |     raise err
web_1                   | posthog.errors.CHQueryErrorUnknownTable: Code: 60.
web_1                   | DB::Exception: There was an error on [10.40.0.4:9000]: Code: 60. DB::Exception: Table default.writable_events doesn't exist. (UNKNOWN_TABLE) (version 22.8.5.29 (official build)). Stack trace:
web_1                   | 
web_1                   | 0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0x8dcd368 in /usr/bin/clickhouse
web_1                   | 1. DB::DDLQueryStatusSource::generate() @ 0x11da27ac in /usr/bin/clickhouse
web_1                   | 2. DB::ISource::tryGenerate() @ 0x127898ac in /usr/bin/clickhouse
web_1                   | 3. DB::ISource::work() @ 0x12789390 in /usr/bin/clickhouse
web_1                   | 4. DB::ExecutionThreadContext::executeTask() @ 0x127a35b4 in /usr/bin/clickhouse
web_1                   | 5. DB::PipelineExecutor::executeStepImpl(unsigned long, std::__1::atomic<bool>*) @ 0x1279862c in /usr/bin/clickhouse
web_1                   | 6. DB::PipelineExecutor::executeImpl(unsigned long) @ 0x12797ccc in /usr/bin/clickhouse
web_1                   | 7. DB::PipelineExecutor::execute(unsigned long) @ 0x12797a94 in /usr/bin/clickhouse
web_1                   | 8. ? @ 0x127a6e0c in /usr/bin/clickhouse
web_1                   | 9. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x8e7f950 in /usr/bin/clickhouse
web_1                   | 10. ? @ 0x8e82844 in /usr/bin/clickhouse
web_1                   | 11. start_thread @ 0x7624 in /usr/lib/aarch64-linux-gnu/libpthread-2.31.so
web_1                   | 12. ? @ 0xd149c in /usr/lib/aarch64-linux-gnu/libc-2.31.so

Environment

  • [ ] PostHog Cloud
  • [ x] self-hosted PostHog, version/commit: please provide

Additional context

it seems to be that this migration hardcode this table

def add_columns_to_required_tables(_):
    sync_execute(ADD_COLUMNS_BASE_SQL.format(table="events", cluster=CLICKHOUSE_CLUSTER))
    sync_execute(ADD_COLUMNS_BASE_SQL.format(table="writable_events", cluster=CLICKHOUSE_CLUSTER))
    sync_execute(ADD_COLUMNS_BASE_SQL.format(table="sharded_events", cluster=CLICKHOUSE_CLUSTER))

while in other places we have:

posthog/models/event/sql.py

WRITABLE_EVENTS_DATA_TABLE = lambda: "writable_events" if settings.CLICKHOUSE_REPLICATION else EVENTS_DATA_TABLE(

Manually removing these lines in the migration seems to solve the issue

    sync_execute(ADD_COLUMNS_BASE_SQL.format(table="writable_events", cluster=CLICKHOUSE_CLUSTER))
    sync_execute(ADD_COLUMNS_BASE_SQL.format(table="sharded_events", cluster=CLICKHOUSE_CLUSTER))

so I guess they should be behind a if ?

Thank you for your bug report – we love squashing them!

allan-simon avatar Sep 18 '22 20:09 allan-simon

Hi! Did you perhaps have some previous docker-compose volumes laying around?

If the latter, try docker-compose rm -v and try again.

macobo avatar Sep 19 '22 07:09 macobo

@macobo thanks for your reply, as you can see in my docker-compose.yml I don't have any volume

but everytime I have doing a drop database on postgres and clickhouse (which are installed on respectively RDS and a dedicated EC2 )

allan-simon avatar Sep 19 '22 07:09 allan-simon

Thanks for the clarification! I suspect there might be something missing

  1. What's the command you're using for dropping the database on clickhouse? Note that clickhouse doesn't actually drop replicated data immediately and it doesn't clear zookeeper, potentially causing issues if the same database/paths are reused.
  2. Could you check the output of system.tables clickhouse table after the drop for any infi_ related tables?
  3. Could you output the full logs for the web pod during migration?

In general though docker-compose isn't suited for a lot of customization (as you're doing here) and issues might arise as our stack evolves. I would recommend checking out our k8s-based installation as well, which is better suited for these setups: https://posthog.com/docs/self-host https://github.com/PostHog/charts-clickhouse/

macobo avatar Sep 19 '22 07:09 macobo

I will check that, I will try on a dedicated new machine so that we're sure there's no state floating around.

as for K8S , it will really increase the cost for nothing (I have a single clickhouse node)

Actually I was even more thinking to go in the oppostive direction and see if posthog could work without a "cluster" (as anyway it's a single node)

allan-simon avatar Sep 19 '22 07:09 allan-simon

@macobo to be honest I haven't time to get back a new instance running but for what it's worth the migration 0027 has this:

def add_columns_to_required_tables(_):
    sync_execute(ADD_COLUMNS_BASE_SQL.format(table="events", cluster=CLICKHOUSE_CLUSTER))

    if clickhouse_is_replicated():
        sync_execute(ADD_COLUMNS_BASE_SQL.format(table="writable_events", cluster=CLICKHOUSE_CLUSTER))
        sync_execute(ADD_COLUMNS_BASE_SQL.format(table="sharded_events", cluster=CLICKHOUSE_CLUSTER))

adding this in 0030 make it works, so I wouldn't see why it would be needed for 0027 and won't for 0030 (especially as not having it create a bug on my side)

allan-simon avatar Sep 19 '22 22:09 allan-simon

Hi! Just saw your PR.

This is missing cleanup on our end but we're planning on removing CLICKHOUSE_REPLICATION environment variable going forward, so the fix would land you in a situation where you're unupgradable. Note that it being set true doesn't actually require setting up a full cluster, however it does require a working zookeeper setup.

macobo avatar Sep 20 '22 06:09 macobo

ahhh fortunately you tell me that before I upgraded our current installation ^^' so i guess i'm back to square one ( i got other issues with the migrations when having it turned to true.. so expect other issues from me ^^" )

allan-simon avatar Sep 20 '22 07:09 allan-simon