self-hosted icon indicating copy to clipboard operation
self-hosted copied to clipboard

Failed migrations and KeyError: 'query' after upgrade

Open sovaa opened this issue 3 years ago • 1 comments

Self-Hosted Version

21.7.0

CPU Architecture

x86_64

Docker Version

19.03.8

Docker Compose Version

1.29.2

Steps to Reproduce

On CentOS 7, upgrade from 21.4.1 to 21.6.3, then to 22.7.0 (as per the "Hard Stops" docs).

Upgrade to 21.6.3 went fine, but when upgrading to 22.7.0, there were some errors during migration. Recreating the kafka/zookeeper volumes lets Sentry run normally for a few hours, then the subscription consumers fails with "KeyError: 'query'".

Following the steps mentioned in #1249 (recreating the volumes again) lets it run normally for another few hours, then it fails with the same error again.

Expected Result

Migrations to be fine and incoming errors to not stop being processed.

Actual Result

The errors encountered during migrating from 21.6.3 to 22.7.0, during Setting up / migrating database ... (full install log in attachments):

ls: cannot access '/usr/local/share/ca-certificates/': Operation not permitted
sentry/requirements.txt is deprecated, use sentry/enhance-image.sh - see https://github.com/getsentry/self-hosted#enhance-sentry-image
stat: cannot statx '/data': Operation not permitted

...and:

  Applying sentry.0233_recreate_subscriptions_in_snuba...Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/sentry/migrations/0233_recreate_subscriptions_in_snuba.py", line 29, in migrate_subscriptions
    subscription_id = _create_in_snuba(subscription)
  File "/usr/local/lib/python3.8/site-packages/sentry/snuba/tasks.py", line 192, in _create_in_snuba
    entity_subscription = get_entity_subscription_from_snuba_query(
  File "/usr/local/lib/python3.8/site-packages/sentry/snuba/entity_subscription.py", line 581, in get_entity_subscription_from_snuba_query
    SnubaQuery.Type(snuba_query.type),
AttributeError: 'SnubaQuery' object has no attribute 'type'
07:25:48 [ERROR] root: failed to recreate 0/c2c87862f21d11ec9ae50242ac120002: 'SnubaQuery' object has no attribute 'type'

The consumer errors after running for a few hours are the same as in #1249 .

Changes made to docker-compose.yml:

diff --git a/docker-compose.yml b/docker-compose.yml
index c35dc3a..6bb6835 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -18,7 +18,8 @@ x-sentry-defaults: &sentry_defaults
   <<: *restart_policy
   image: sentry-self-hosted-local
   # Set the platform to build for linux/arm64 when needed on Apple silicon Macs.
-  platform: ${DOCKER_PLATFORM:-}
+  #platform: ${DOCKER_PLATFORM:-}
+  #platform: "linux/amd64"
   build:
     context: ./sentry
     args:
@@ -58,6 +59,7 @@ x-sentry-defaults: &sentry_defaults
     PYTHONUSERBASE: "/data/custom-packages"
     SENTRY_CONF: "/etc/sentry"
     SNUBA: "http://snuba-api:1218"
+    GEOIP_PATH_MMDB: '/geoip/GeoLite2-City.mmdb'
     # Force everything to use the system CA bundle
     # This is mostly needed to support installing custom CA certs
     # This one is used by botocore
@@ -68,12 +70,14 @@ x-sentry-defaults: &sentry_defaults
     GRPC_DEFAULT_SSL_ROOTS_FILE_PATH_ENV_VAR: *ca_bundle
     # Leaving the value empty to just pass whatever is set
     # on the host system (or in the .env file)
-    SENTRY_EVENT_RETENTION_DAYS:
+    SENTRY_EVENT_RETENTION_DAYS: 56
     SENTRY_MAIL_HOST:
   volumes:
-    - "sentry-data:/data"
+    - "/data/tncdata/sentry/sentry-data:/data"
+    #- "./sentry-data:/data"
     - "./sentry:/etc/sentry"
-    - "./geoip:/geoip:ro"
+    - "/data/tncdata/sentry/geoip:/geoip"
+    #- "./geoip:/geoip"
     - "./certificates:/usr/local/share/ca-certificates:ro"
 x-snuba-defaults: &snuba_defaults
   <<: *restart_policy
@@ -94,12 +98,13 @@ x-snuba-defaults: &snuba_defaults
     UWSGI_DISABLE_LOGGING: "true"
     # Leaving the value empty to just pass whatever is set
     # on the host system (or in the .env file)
-    SENTRY_EVENT_RETENTION_DAYS:
+    SENTRY_EVENT_RETENTION_DAYS: 56
 services:
   smtp:
     <<: *restart_policy
     image: tianon/exim4
-    hostname: "${SENTRY_MAIL_HOST:-}"
+    #hostname: "${SENTRY_MAIL_HOST:-}"
+    hostname: "sentry2.redacted.com"
     volumes:
       - "sentry-smtp:/var/spool/exim4"
       - "sentry-smtp-log:/var/log/exim4"
@@ -117,7 +122,7 @@ services:
       <<: *healthcheck_defaults
       test: redis-cli ping
     volumes:
-      - "sentry-redis:/data"
+      - "/data/tncdata/sentry/sentry-redis:/data"
     ulimits:
       nofile:
         soft: 10032
@@ -128,7 +133,8 @@ services:
     healthcheck:
       <<: *healthcheck_defaults
       # Using default user "postgres" from sentry/sentry.conf.example.py or value of POSTGRES_USER if provided
-      test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-postgres}"]
+      #test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-postgres}"]
+      test: ["CMD-SHELL", "pg_isready -U sentryuser"]
     command:
       [
         "postgres",
@@ -143,7 +149,7 @@ services:
       POSTGRES_HOST_AUTH_METHOD: "trust"
     entrypoint: /opt/sentry/postgres-entrypoint.sh
     volumes:
-      - "sentry-postgres:/var/lib/postgresql/data"
+      - "/data/tncdata/sentry/sentry-postgres:/var/lib/postgresql/data"
       - type: bind
         read_only: true
         source: ./postgres/
@@ -158,8 +164,8 @@ services:
       ZOOKEEPER_TOOLS_LOG4J_LOGLEVEL: "WARN"
       KAFKA_OPTS: "-Dzookeeper.4lw.commands.whitelist=ruok"
     volumes:
-      - "sentry-zookeeper:/var/lib/zookeeper/data"
-      - "sentry-zookeeper-log:/var/lib/zookeeper/log"
+      - "/data/tncdata/sentry/sentry-zookeeper:/var/lib/zookeeper/data"
+      - "/data/tncdata/sentry/sentry-zookeeper-log:/var/lib/zookeeper/log"
       - "sentry-secrets:/etc/zookeeper/secrets"
     healthcheck:
       <<: *healthcheck_defaults
@@ -184,8 +190,8 @@ services:
       KAFKA_LOG4J_ROOT_LOGLEVEL: "WARN"
       KAFKA_TOOLS_LOG4J_LOGLEVEL: "WARN"
     volumes:
-      - "sentry-kafka:/var/lib/kafka/data"
-      - "sentry-kafka-log:/var/lib/kafka/log"
+      - "/data/tncdata/sentry/sentry-kafka:/var/lib/kafka/data"
+      - "/data/tncdata/sentry/sentry-kafka-log:/var/lib/kafka/log"
       - "sentry-secrets:/etc/kafka/secrets"
     healthcheck:
       <<: *healthcheck_defaults
@@ -197,14 +203,15 @@ services:
       context:
         ./clickhouse
       args:
-        BASE_IMAGE: "${CLICKHOUSE_IMAGE:-}"
+        #BASE_IMAGE: "${CLICKHOUSE_IMAGE:-}"
+        BASE_IMAGE: "yandex/clickhouse-server:20.3.9.70"
     ulimits:
       nofile:
         soft: 262144
         hard: 262144
     volumes:
-      - "sentry-clickhouse:/var/lib/clickhouse"
-      - "sentry-clickhouse-log:/var/log/clickhouse-server"
+      - "/data/tncdata/sentry/sentry-clickhouse:/var/lib/clickhouse"
+      - "/data/tncdata/sentry/sentry-clickhouse-log:/var/log/clickhouse-server"
       - type: bind
         read_only: true
         source: ./clickhouse/config.xml
@@ -213,7 +220,7 @@ services:
       # This limits Clickhouse's memory to 30% of the host memory
       # If you have high volume and your search return incomplete results
       # You might want to change this to a higher value (and ensure your host has enough memory)
-      MAX_MEMORY_USAGE_RATIO: 0.3
+      MAX_MEMORY_USAGE_RATIO: 0.4
     healthcheck:
       test:
         [
@@ -388,11 +395,17 @@ volumes:
   sentry-symbolicator:
     external: true

+  sentry-zookeeper-log:
+    external: true
+  sentry-kafka-log:
+    external: true
+  sentry-clickhouse-log:
+    external: true
+  geoip:
+    external: true
+
   # These store ephemeral data that needn't persist across restarts.
   sentry-secrets:
   sentry-smtp:
   sentry-nginx-cache:
-  sentry-zookeeper-log:
-  sentry-kafka-log:
   sentry-smtp-log:
-  sentry-clickhouse-log:

Incoming errors are affected:

SEVaVIO9n1

But transactions are not affected (except when Sentry was down for a while):

Usage-Stats-Sentry

Is there a way to try to re-run the migrations? Or rollback?

PS: We run the postgres database on a separate host, version 9.5.14.

sentry_install_log-2022-07-27_09-22-24.txt

sovaa avatar Jul 28 '22 02:07 sovaa

Hm, based on

stat: cannot statx '/data': Operation not permitted

and your docker-compose.yml changes, it seems that the bind mount you're using doesn't have the correct permissions for clickhouse or some other service. I would recommend checking that the snuba user in the snuba container has access to that location.

Once you've done that, you should be able to re-run ./install.sh and it should fix things if I'm not mistaken.

emmatyping avatar Jul 28 '22 16:07 emmatyping

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog or Status: In Progress, I will leave it alone ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

github-actions[bot] avatar Aug 19 '22 00:08 github-actions[bot]