self-hosted icon indicating copy to clipboard operation
self-hosted copied to clipboard

kafka topic offset error

Open jason-desygner opened this issue 1 year ago • 9 comments

Self-Hosted Version

23.12.1

CPU Architecture

x86_64

Docker Version

24.0.7

Docker Compose Version

v2.21.0

Steps to Reproduce

after update I had high load I turned off DNS to sentry for 24 hours to see if i was from our apps. after this one of the service keeps crashing.

Expected Result

working sentry

Actual Result

00:33:49 [INFO] arroyo.processing.processor: New partitions assigned: {Partition(topic=Topic(name='ingest-transactions'), index=0): 236826666} Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/arroyo/processing/processor.py", line 319, in run self._run_once() File "/usr/local/lib/python3.10/site-packages/arroyo/processing/processor.py", line 381, in _run_once self.__message = self.__consumer.poll(timeout=1.0) File "/usr/local/lib/python3.10/site-packages/arroyo/backends/kafka/consumer.py", line 414, in poll raise OffsetOutOfRange(str(error)) arroyo.errors.OffsetOutOfRange: KafkaError{code=_AUTO_OFFSET_RESET,val=-140,str="fetch failed due to requested offset not available on the broker: Broker: Offset out of range (broker 1001)"} 00:33:49 [ERROR] arroyo.processing.processor: Caught exception, shutting down... 00:33:49 [INFO] arroyo.processing.processor: Closing <arroyo.backends.kafka.consumer.KafkaConsumer object at 0x7f0198aa8700>... 00:33:49 [INFO] arroyo.processing.processor: Partitions to revoke: [Partition(topic=Topic(name='ingest-transactions'), index=0)] 00:33:49 [INFO] arroyo.processing.processor: Partition revocation complete. 00:33:49 [INFO] arroyo.processing.processor: Processor terminated Traceback (most recent call last): File "/usr/local/bin/sentry", line 8, in sys.exit(main()) File "/usr/local/lib/python3.10/site-packages/sentry/runner/init.py", line 195, in main func(**kwargs) File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func return f(get_current_context(), *args, **kwargs) File "/usr/local/lib/python3.10/site-packages/sentry/runner/decorators.py", line 69, in inner return ctx.invoke(f, *args, **kwargs) File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func return f(get_current_context(), *args, **kwargs) File "/usr/local/lib/python3.10/site-packages/sentry/runner/decorators.py", line 29, in inner return ctx.invoke(f, *args, **kwargs) File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/sentry/runner/commands/run.py", line 602, in basic_consumer run_processor_with_signals(processor) File "/usr/local/lib/python3.10/site-packages/sentry/utils/kafka.py", line 13, in run_processor_with_signals processor.run() File "/usr/local/lib/python3.10/site-packages/arroyo/processing/processor.py", line 319, in run self._run_once() File "/usr/local/lib/python3.10/site-packages/arroyo/processing/processor.py", line 381, in _run_once self.__message = self.__consumer.poll(timeout=1.0) File "/usr/local/lib/python3.10/site-packages/arroyo/backends/kafka/consumer.py", line 414, in poll raise OffsetOutOfRange(str(error)) arroyo.errors.OffsetOutOfRange: KafkaError{code=_AUTO_OFFSET_RESET,val=-140,str="fetch failed due to requested offset not available on the broker: Broker: Offset out of range (broker 1001)"}

Event ID

No response

jason-desygner avatar Dec 26 '23 00:12 jason-desygner

I have tried to reset offset follow other git hub issues.

https://github.com/getsentry/self-hosted/issues/1894

docker compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --group snuba-consumers --topic ingest-transactions --delete-offsets

docker compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --group snuba-consumers --topic ingest-sessions --delete-offsets

ghost avatar Dec 26 '23 00:12 ghost

One thing that you can try out if your sentry traffic is at its' lowest is to start the Kafka anew. So it would mean re-creating the kafka volume, clearing out kafka logs volume, but having everything else stay as it is. Run ./install.sh again, as it would create the kafka topics from the start, and every offsets would be at 0 again.

This would also means there will be data loss for submitted error events (or performance/tracing, or replay, or profiling) in the last 5 to 15 minutes or so. But if you're okay with that, I think that's one way to quickly resolve things out.

aldy505 avatar Dec 27 '23 09:12 aldy505

I have the same error message for these containers

sentry-self-hosted-snuba-transactions-consumer-1
sentry-self-hosted-transactions-consumer-1
sentry-self-hosted-post-process-forwarder-transactions-1
sentry-self-hosted-generic-metrics-consumer-1
sentry-self-hosted-billing-metrics-consumer-1

I tried the fix here https://github.com/getsentry/self-hosted/issues/1894#issuecomment-1426538497 but nothing changed. On top of that my Sentry performance report has no data update for about 5 days already

ghun131 avatar Jan 03 '24 09:01 ghun131

If that still doesn't work, I'd try to maybe remake your kafka/zookeeper volumes as mentioned above.

hubertdeng123 avatar Jan 04 '24 23:01 hubertdeng123

@hubertdeng123 Can it be possible that my instance is out of disk space? I have a 320GB instance and when I ran docker system df, Sentry only took about 192GB

ghun131 avatar Jan 05 '24 14:01 ghun131

@hubertdeng123 Can it be possible that my instance is out of disk space? I have a 320GB instance and when I ran docker system df, Sentry only took about 192GB

@ghun131 Yes that's one possibility. You can clean up some of your Docker junk without losing your Sentry data by pruning unused images, builder, volumes, network.

Execute this while Sentry is running:

sudo docker container prune -f
sudo docker builder prune -f
sudo docker image prune -f --all

Then check your Docker logs, is it taking too many space? Try to eliminate those first.

Then try to check how much each or your Sentry database/storage volume takes. Does it takes too many?

sudo docker volume ls
sudo docker volume inspect sentry-data
sudo du -hs 'copy the volume mount point path here'

aldy505 avatar Jan 05 '24 14:01 aldy505

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you remove the label Waiting for: Community, I will leave it alone ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

getsantry[bot] avatar Jan 28 '24 08:01 getsantry[bot]

Thank you for your answer. I don't think it takes too much storage

image

but my boss just told me that it's ok for us to just store data for only 45 days so I killed off this instance and add a new one. Everything's working well.

ghun131 avatar Jan 28 '24 14:01 ghun131

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you remove the label Waiting for: Community, I will leave it alone ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

getsantry[bot] avatar Feb 21 '24 08:02 getsantry[bot]