self-hosted
self-hosted copied to clipboard
snuba-subscription-consumer-events restarts with different KeyError:
Self-Hosted Version
23.1.1
CPU Architecture
x86_64
Docker Version
18.03.1-ce
Docker Compose Version
1.29.2
Machine Specification
- [x] My system meets the minimum system requirements of Sentry
Steps to Reproduce
- run docker-compose sentry v23.1.1 on more than 1 VM(server)
- change kafka partitions for topic
snuba-commit-log - change TOPIC_PARTITION_COUNTS to represent actual number of partitions for the given kafka topic
- run
snuba-subscription-consumer-events
Expected Result
snuba-subscription-consumer-events service runs without errors and restarts
Actual Result
I've read somewhere that the type of error I've encountered is somehow connected with recent changes in kafka partition number for the topic.
So I've got snuba-subscription-consumer-events service constantly restarting with such errors:
Click to expand logs
2025-02-07 15:18:06,869 Initializing Snuba...
2025-02-07 15:18:10,564 Snuba initialization took 3.6961687952280045s
2025-02-07 15:18:11,149 Initializing Snuba...
2025-02-07 15:18:14,786 Snuba initialization took 3.637780174612999s
2025-02-07 15:18:17,515 New partitions assigned: {Partition(topic=Topic(name='snuba-commit-log'), index=0): 40134965}
2025-02-07 15:18:17,530 Caught exception, shutting down...
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 175, in run
self._run_once()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 225, in _run_once
self.__processing_strategy.submit(message)
File "/usr/src/snuba/snuba/subscriptions/scheduler_processing_strategy.py", line 240, in submit
self.__next_step.submit(message)
File "/usr/src/snuba/snuba/subscriptions/combined_scheduler_executor.py", line 246, in submit
tasks.extend([task for task in entity_scheduler[tick.partition].find(tick)])
KeyError: 10
2025-02-07 15:18:17,532 Closing <snuba.subscriptions.scheduler_consumer.CommitLogTickConsumer object at 0x7efcd0610250>...
2025-02-07 15:18:17,533 Partitions revoked: [Partition(topic=Topic(name='snuba-commit-log'), index=0)]
2025-02-07 15:18:17,535 Processor terminated
Traceback (most recent call last):
File "/usr/local/bin/snuba", line 33, in <module>
sys.exit(load_entry_point('snuba', 'console_scripts', 'snuba')())
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/src/snuba/snuba/cli/subscriptions_scheduler_executor.py", line 140, in subscriptions_scheduler_executor
processor.run()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 175, in run
self._run_once()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 225, in _run_once
self.__processing_strategy.submit(message)
File "/usr/src/snuba/snuba/subscriptions/scheduler_processing_strategy.py", line 240, in submit
self.__next_step.submit(message)
File "/usr/src/snuba/snuba/subscriptions/combined_scheduler_executor.py", line 246, in submit
tasks.extend([task for task in entity_scheduler[tick.partition].find(tick)])
KeyError: 10
2025-02-07 15:18:26,443 Initializing Snuba...
2025-02-07 15:18:30,162 Snuba initialization took 3.7202555276453495s
2025-02-07 15:18:30,754 Initializing Snuba...
2025-02-07 15:18:34,535 Snuba initialization took 3.7820699214935303s
2025-02-07 15:18:35,411 New partitions assigned: {Partition(topic=Topic(name='snuba-commit-log'), index=1): 104750374}
2025-02-07 15:18:35,418 Caught exception, shutting down...
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 175, in run
self._run_once()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 225, in _run_once
self.__processing_strategy.submit(message)
File "/usr/src/snuba/snuba/subscriptions/scheduler_processing_strategy.py", line 240, in submit
self.__next_step.submit(message)
File "/usr/src/snuba/snuba/subscriptions/combined_scheduler_executor.py", line 246, in submit
tasks.extend([task for task in entity_scheduler[tick.partition].find(tick)])
KeyError: 3
2025-02-07 15:18:35,420 Closing <snuba.subscriptions.scheduler_consumer.CommitLogTickConsumer object at 0x7f0ed5eee250>...
2025-02-07 15:18:35,420 Partitions revoked: [Partition(topic=Topic(name='snuba-commit-log'), index=1)]
2025-02-07 15:18:35,425 Processor terminated
Traceback (most recent call last):
File "/usr/local/bin/snuba", line 33, in <module>
sys.exit(load_entry_point('snuba', 'console_scripts', 'snuba')())
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/src/snuba/snuba/cli/subscriptions_scheduler_executor.py", line 140, in subscriptions_scheduler_executor
processor.run()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 175, in run
self._run_once()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 225, in _run_once
self.__processing_strategy.submit(message)
File "/usr/src/snuba/snuba/subscriptions/scheduler_processing_strategy.py", line 240, in submit
self.__next_step.submit(message)
File "/usr/src/snuba/snuba/subscriptions/combined_scheduler_executor.py", line 246, in submit
tasks.extend([task for task in entity_scheduler[tick.partition].find(tick)])
KeyError: 3
2025-02-07 15:18:44,927 Initializing Snuba...
2025-02-07 15:18:48,769 Snuba initialization took 3.842433415353298s
2025-02-07 15:18:49,370 Initializing Snuba...
2025-02-07 15:18:53,027 Snuba initialization took 3.6579839438199997s
2025-02-07 15:18:54,882 New partitions assigned: {Partition(topic=Topic(name='snuba-commit-log'), index=4): 55029728}
2025-02-07 15:18:54,896 Caught exception, shutting down...
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 175, in run
self._run_once()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 225, in _run_once
self.__processing_strategy.submit(message)
File "/usr/src/snuba/snuba/subscriptions/scheduler_processing_strategy.py", line 240, in submit
self.__next_step.submit(message)
File "/usr/src/snuba/snuba/subscriptions/combined_scheduler_executor.py", line 246, in submit
tasks.extend([task for task in entity_scheduler[tick.partition].find(tick)])
KeyError: 2
2025-02-07 15:18:54,898 Closing <snuba.subscriptions.scheduler_consumer.CommitLogTickConsumer object at 0x7f3e80843250>...
2025-02-07 15:18:54,898 Partitions revoked: [Partition(topic=Topic(name='snuba-commit-log'), index=4)]
2025-02-07 15:18:54,902 Processor terminated
Traceback (most recent call last):
File "/usr/local/bin/snuba", line 33, in <module>
sys.exit(load_entry_point('snuba', 'console_scripts', 'snuba')())
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/src/snuba/snuba/cli/subscriptions_scheduler_executor.py", line 140, in subscriptions_scheduler_executor
processor.run()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 175, in run
self._run_once()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 225, in _run_once
self.__processing_strategy.submit(message)
File "/usr/src/snuba/snuba/subscriptions/scheduler_processing_strategy.py", line 240, in submit
self.__next_step.submit(message)
File "/usr/src/snuba/snuba/subscriptions/combined_scheduler_executor.py", line 246, in submit
tasks.extend([task for task in entity_scheduler[tick.partition].find(tick)])
KeyError: 2
2025-02-07 15:19:02,043 Initializing Snuba...
2025-02-07 15:19:05,724 Snuba initialization took 3.6818041875958443s
2025-02-07 15:19:06,300 Initializing Snuba...
2025-02-07 15:19:09,999 Snuba initialization took 3.6995532028377056s
2025-02-07 15:19:13,070 New partitions assigned: {Partition(topic=Topic(name='snuba-commit-log'), index=4): 55029728}
2025-02-07 15:19:13,086 Caught exception, shutting down...
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 175, in run
self._run_once()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 225, in _run_once
self.__processing_strategy.submit(message)
File "/usr/src/snuba/snuba/subscriptions/scheduler_processing_strategy.py", line 240, in submit
self.__next_step.submit(message)
File "/usr/src/snuba/snuba/subscriptions/combined_scheduler_executor.py", line 246, in submit
tasks.extend([task for task in entity_scheduler[tick.partition].find(tick)])
KeyError: 2
2025-02-07 15:19:13,087 Closing <snuba.subscriptions.scheduler_consumer.CommitLogTickConsumer object at 0x7fc729bf0250>...
2025-02-07 15:19:13,088 Partitions revoked: [Partition(topic=Topic(name='snuba-commit-log'), index=4)]
2025-02-07 15:19:13,089 Processor terminated
Traceback (most recent call last):
File "/usr/local/bin/snuba", line 33, in <module>
sys.exit(load_entry_point('snuba', 'console_scripts', 'snuba')())
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/src/snuba/snuba/cli/subscriptions_scheduler_executor.py", line 140, in subscriptions_scheduler_executor
processor.run()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 175, in run
self._run_once()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 225, in _run_once
self.__processing_strategy.submit(message)
File "/usr/src/snuba/snuba/subscriptions/scheduler_processing_strategy.py", line 240, in submit
self.__next_step.submit(message)
File "/usr/src/snuba/snuba/subscriptions/combined_scheduler_executor.py", line 246, in submit
tasks.extend([task for task in entity_scheduler[tick.partition].find(tick)])
KeyError: 2
KeyError number can be different. I've seen 2,4,14,16 :)
My snuba-config/setting.py contains TOPIC_PARTITION_COUNTS as follows:
***
TOPIC_PARTITION_COUNTS = Mapping[str, int] = {
"event-replacements": 5,
"events": 20,
"events-subscription-results": 5,
"ingest-sessions": 5,
"outcomes": 5,
"snuba-commit-log": 5,
"snuba-transactions-commit-log": 5,
"transactions-subscription-results": 5
}
in docker-compose I also have similar configs. Added to see mb service can read only one or another option.
***
snuba-subscription-consumer-events:
***
TOPIC_PARTITION_COUNTS: '{"event-replacements": 5,"events": 20,"events-subscription-results": 5,"ingest-sessions": 5,"outcomes": 5,"snuba-commit-log": 5,"snuba-transactions-commit-log": 5,"transactions-subscription-results": 5}'
command: subscriptions-scheduler-executor --dataset events --entity events --auto-offset-reset=latest --no-strict-offset-reset --consumer-group=snuba-events-subscriptions-consumers --followed-consumer-group=snuba-consumers --delay-seconds=60 --schedule-ttl=60 --stale-threshold-seconds=900
***
And my kafka group\topic is constantly in state rebalancing or has no active members
lag is growing.
Had 5 partitions as you can see below:
Warning: Consumer group 'snuba-events-subscriptions-consumers' is rebalancing.
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
snuba-events-subscriptions-consumers snuba-commit-log 4 55029728 55030657 929 - - -
snuba-events-subscriptions-consumers snuba-commit-log 1 104750374 104751516 1142 - - -
snuba-events-subscriptions-consumers snuba-commit-log 3 87189654 87193048 3394 - - -
snuba-events-subscriptions-consumers snuba-commit-log 2 73663948 73665379 1431 - - -
snuba-events-subscriptions-consumers snuba-commit-log 0 40134965 40135652 687 - - -
snuba-transactions-commit-log lag is also growing, but i guess it is a different issue. Consumer containers for that group assigned to a correct partition indexes and don't restarting.
Event ID
No response
any suggestions would be awesome :)
same issue
Self-Hosted Version 25.2.0
CPU Architecture x86_64
Docker Version Docker version 27.5.1, build 9f9e405
Docker Compose Version Docker Compose version v2.32.4-desktop.1
Machine Specification MacBook Pro(16 英寸,2019 年) 处理器:2.4 GHz 八核Intel Core i9 内存:32 GB 2667 MHz DDR4
My system meets the minimum system requirements of Sentry
logs:⬇️
Docker images built.
▶ Bootstrapping and migrating Snuba ...
Network sentry-self-hosted_default Creating
Network sentry-self-hosted_default Created
Container sentry-self-hosted-redis-1 Creating
Container sentry-self-hosted-kafka-1 Creating
Container sentry-self-hosted-clickhouse-1 Creating
Container sentry-self-hosted-clickhouse-1 Created
Container sentry-self-hosted-redis-1 Created
Container sentry-self-hosted-kafka-1 Created
Container sentry-self-hosted-clickhouse-1 Starting
Container sentry-self-hosted-kafka-1 Starting
Container sentry-self-hosted-redis-1 Starting
Container sentry-self-hosted-kafka-1 Started
Container sentry-self-hosted-clickhouse-1 Started
Container sentry-self-hosted-redis-1 Started
2025-02-26 13:33:19,950 Initializing Snuba...
2025-02-26 13:33:24,048 Snuba initialization took 4.098573600000236s
2025-02-26 13:33:24,656 Initializing Snuba...
2025-02-26 13:33:28,653 Snuba initialization took 3.9985657639990677s
2025-02-26 13:33:28,662 Attempting to connect to Kafka (attempt 0)...
2025-02-26 13:33:28,691 Connected to Kafka on attempt 0
2025-02-26 13:33:28,769 Creating Kafka topics...
Traceback (most recent call last):
File "/usr/local/bin/snuba", line 33, in
Cleaning up...
Heya @untitaker can you take a look at this? Thanks!
@phacops @onewland i think this is for your team
Well. I dealt with this issue by recreating all topics with partition number of 1.
That way you don't have to maintain TOPIC_PARTITION_COUNTS anymore.
And iirc cleanup of old records from tables where partition != 0 or just sentry-cleanup of old data.