self-hosted
self-hosted copied to clipboard
OffsetOutOfRange errors have returned
Self-Hosted Version
24.3.0
CPU Architecture
x86_64
Docker Version
26.0.0 / 24.0.7
Docker Compose Version
2.25.0 / 2.21.1
Steps to Reproduce
The OffsetOutOfRange errors (discussed before in https://github.com/getsentry/self-hosted/issues/1894) have spontaneously returned on 2 out of my 3 self-hosted installations. This is mostly visible due to the alert no longer being executed.
For one of them, I removed the kafka and zookeeper volumes last week to solve the issue, but it seems that it was only temporary as the errors have returned. The other one only catched my attention just now.
As this might be related to https://github.com/getsentry/self-hosted/issues/2931 and https://github.com/getsentry/self-hosted/issues/2876, I will remove the kafka and zookeeper volumes now again, and replace the rust-consumer
s with consumer
.
I'm also seeing https://github.com/getsentry/snuba/issues/5707 on the other instance, so I will be changed that to the non-rust consumers there as well.
Expected Result
Well, no errors, and events being processed correctly 😄
Actual Result
sentry-self-hosted-post-process-forwarder-errors-1 | 11:17:54 [INFO] arroyo.processing.processor: Processor terminated
sentry-self-hosted-post-process-forwarder-transactions-1 | 11:17:54 [INFO] arroyo.processing.processor: New partitions assigned: {Partition(topic=Topic(name='transactions'), index=0): 0}
sentry-self-hosted-post-process-forwarder-transactions-1 | 11:17:54 [INFO] sentry.post_process_forwarder.post_process_forwarder: Starting multithreaded post process forwarder
sentry-self-hosted-post-process-forwarder-errors-1 | Traceback (most recent call last):
sentry-self-hosted-post-process-forwarder-errors-1 | File "/usr/local/bin/sentry", line 8, in <module>
sentry-self-hosted-post-process-forwarder-errors-1 | sys.exit(main())
sentry-self-hosted-post-process-forwarder-errors-1 | ^^^^^^
sentry-self-hosted-post-process-forwarder-errors-1 | File "/usr/local/lib/python3.11/site-packages/sentry/runner/__init__.py", line 190, in main
sentry-self-hosted-post-process-forwarder-errors-1 | func(**kwargs)
sentry-self-hosted-post-process-forwarder-errors-1 | File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
sentry-self-hosted-post-process-forwarder-errors-1 | return self.main(*args, **kwargs)
sentry-self-hosted-post-process-forwarder-errors-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^
sentry-self-hosted-post-process-forwarder-errors-1 | File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
sentry-self-hosted-post-process-forwarder-errors-1 | rv = self.invoke(ctx)
sentry-self-hosted-post-process-forwarder-errors-1 | ^^^^^^^^^^^^^^^^
sentry-self-hosted-post-process-forwarder-errors-1 | File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
sentry-self-hosted-post-process-forwarder-errors-1 | return _process_result(sub_ctx.command.invoke(sub_ctx))
sentry-self-hosted-post-process-forwarder-errors-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sentry-self-hosted-post-process-forwarder-errors-1 | File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
sentry-self-hosted-post-process-forwarder-errors-1 | return _process_result(sub_ctx.command.invoke(sub_ctx))
sentry-self-hosted-post-process-forwarder-errors-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sentry-self-hosted-post-process-forwarder-errors-1 | File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
sentry-self-hosted-post-process-forwarder-errors-1 | return ctx.invoke(self.callback, **ctx.params)
sentry-self-hosted-post-process-forwarder-errors-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sentry-self-hosted-post-process-forwarder-errors-1 | File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
sentry-self-hosted-post-process-forwarder-errors-1 | return __callback(*args, **kwargs)
sentry-self-hosted-post-process-forwarder-errors-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^
sentry-self-hosted-post-process-forwarder-errors-1 | File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
sentry-self-hosted-post-process-forwarder-errors-1 | return f(get_current_context(), *args, **kwargs)
sentry-self-hosted-post-process-forwarder-errors-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sentry-self-hosted-post-process-forwarder-errors-1 | File "/usr/local/lib/python3.11/site-packages/sentry/runner/decorators.py", line 69, in inner
sentry-self-hosted-post-process-forwarder-errors-1 | return ctx.invoke(f, *args, **kwargs)
sentry-self-hosted-post-process-forwarder-errors-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sentry-self-hosted-post-process-forwarder-errors-1 | File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
sentry-self-hosted-post-process-forwarder-errors-1 | return __callback(*args, **kwargs)
sentry-self-hosted-post-process-forwarder-errors-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^
sentry-self-hosted-post-process-forwarder-errors-1 | File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
sentry-self-hosted-post-process-forwarder-errors-1 | return f(get_current_context(), *args, **kwargs)
sentry-self-hosted-post-process-forwarder-errors-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sentry-self-hosted-post-process-forwarder-errors-1 | File "/usr/local/lib/python3.11/site-packages/sentry/runner/decorators.py", line 29, in inner
sentry-self-hosted-post-process-forwarder-errors-1 | return ctx.invoke(f, *args, **kwargs)
sentry-self-hosted-post-process-forwarder-errors-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sentry-self-hosted-post-process-forwarder-errors-1 | File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
sentry-self-hosted-post-process-forwarder-errors-1 | return __callback(*args, **kwargs)
sentry-self-hosted-post-process-forwarder-errors-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^
sentry-self-hosted-post-process-forwarder-errors-1 | File "/usr/local/lib/python3.11/site-packages/sentry/runner/commands/run.py", line 448, in basic_consumer
sentry-self-hosted-post-process-forwarder-errors-1 | run_processor_with_signals(processor, consumer_name)
sentry-self-hosted-post-process-forwarder-errors-1 | File "/usr/local/lib/python3.11/site-packages/sentry/utils/kafka.py", line 46, in run_processor_with_signals
sentry-self-hosted-post-process-forwarder-errors-1 | processor.run()
sentry-self-hosted-post-process-forwarder-errors-1 | File "/usr/local/lib/python3.11/site-packages/arroyo/processing/processor.py", line 322, in run
sentry-self-hosted-post-process-forwarder-errors-1 | self._run_once()
sentry-self-hosted-post-process-forwarder-errors-1 | File "/usr/local/lib/python3.11/site-packages/arroyo/processing/processor.py", line 384, in _run_once
sentry-self-hosted-post-process-forwarder-errors-1 | self.__message = self.__consumer.poll(timeout=1.0)
sentry-self-hosted-post-process-forwarder-errors-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sentry-self-hosted-post-process-forwarder-errors-1 | File "/usr/local/lib/python3.11/site-packages/sentry/consumers/synchronized.py", line 235, in poll
sentry-self-hosted-post-process-forwarder-errors-1 | message = self.__consumer.poll(timeout)
sentry-self-hosted-post-process-forwarder-errors-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sentry-self-hosted-post-process-forwarder-errors-1 | File "/usr/local/lib/python3.11/site-packages/arroyo/backends/kafka/consumer.py", line 414, in poll
sentry-self-hosted-post-process-forwarder-errors-1 | raise OffsetOutOfRange(str(error))
sentry-self-hosted-post-process-forwarder-errors-1 | arroyo.errors.OffsetOutOfRange: KafkaError{code=_AUTO_OFFSET_RESET,val=-140,str="fetch failed due to requested offset not available on the broker: Broker: Offset out of range (broker 1001)"}
sentry-self-hosted-post-process-forwarder-errors-1 exited with code 0
Event ID
No response
Same here. So far the same consumer group & topic it seems. Consumer group: post-process-forwarder Topic: events
I'm sorry I can't hold this back
Jokes aside, I can't reproduce this on my end since I don't use Kafka anymore (I replaced it with Redpanda and I got no errors like this). Does this command still works?
sudo docker compose down && \ # We shutdown everything, but we only want to keep Kafka running
sudo docker compose up -d --wait kafka && \
sudo docker compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --group post-process-forwarder --delete && \ # Delete the post-process-forwarder consumer group
sudo docker compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --group post-process-forwarder --topic events --delete-offsets && \ # Delete events topic offsets from consumer group named post-process-forwarder on Kafka
sudo docker compose up -d # To start everything again
Let me know if that works.
@aldy505 Well, that approach could work too, however what I do in these cases is just a simple offset reset:
docker compose down -v
docker compose --env-file .env.custom up -d kafka
docker compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --reset-offsets --to-latest --execute --group post-process-forwarder --topic events
docker compose --env-file .env.custom up -d
or you can "optimistically" reset all of them:
docker compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --reset-offsets --to-latest --execute --all-groups --all-topics
Unsure why this is only happening for post-process-forwarder, since I don't think that was converted to rust-consumer. Wondering if we are missing a --no-strict-offset-reset
on the post-process-forwarder
containers. Could you try adding that in the docker compose file? @hostalp @bobvandevijver
+1
It looks like that reverting to the non-rust consumers has fixed it for now: I haven't seen the offset issue return since when I created the ticket and removed the kafka volumes.
I've on the other hand set the suggested --no-strict-offset-reset
flag on all 3 post-process-forwarder consumers, however it may take days or even weeks to find out whether it really helped.
I also reverted to the non-rust consumers, but today the 3 post-process-forwarder consumers failed again with OffsetOutOfRange errors
I will try to go back to the rust-consumers now, and add --no-strict-offset-reset
the the post-process-forwarders
Does this command still works [...] Let me know if that works.
It works :-)
I've since added --no-strict-offset-reset
had no crashes (including the post-process-forwarders), which before adding the option crashed every 5 days or so
For me, this seems to have successfully fixed the issue, with seemingly no side effects :-)
I concur.
@hubertdeng123 @azaslavsky Do you think it's safe to put the --no-strict-offset-reset
on some of the containers that don't have it by default (as in, hardcoded on the docker-compose.yml
)? Can you validate that out with the code owners on Slack? Thanks!
~~Note that I did not add the --no-strict-offset-reset
option, I only switched to the non-rust consumers. And the error hasn't returned since for us.~~
Update June 4th: The error did return, so now I did add --no-strict-offset-reset
and reverted back to the rust consumers.
I encountered the same issue. Initially, I upgraded to version 24.4.2, but ran into this problem: https://github.com/getsentry/self-hosted/issues/2876. Consequently, after restoring the entire Sentry system, I decided to upgrade to version 24.2.0 since it does not contain any rust-consumer.
Unfortunately, I encountered this issue, which was quite disappointing!
However, echoing what @hubertdeng123 suggested, adding --no-strict-offset-reset
to the post-process-forwarder
containers resolved the issue for me.
Additionally, I also have same concern with @aldy505 . I don't know it is safe to put --no-strict-offset-reset
It should be safe to do, looks like we do that in prod. I'm going to put up a PR to add this option to the post process forwarders.
It should be safe to do, looks like we do that in prod. I'm going to put up a PR to add this option to the post process forwarders.
@hubertdeng123 I found this PR was be released in 24.5.1, but because of these issues https://github.com/getsentry/self-hosted/issues/2876 https://github.com/getsentry/snuba/issues/5707, we can not do any upgrade.
It should be safe to do, looks like we do that in prod. I'm going to put up a PR to add this option to the post process forwarders.
@hubertdeng123 I found this PR was be released in 24.5.1, but because of these issues #2876 getsentry/snuba#5707, we can not do any upgrade.
@liukch Those two are separate issue. The massive ClickHouse logs don't really cause any ingestion or event issues on the running Sentry instance, so I'm wondering what's happening on your side that made you "can not do any upgrade". Would you please expand about that in a separate (or perhaps more relevant) issue?
@aldy505 ClickHouse generates a large number of logs, which is a serious problem in itself. Additionally, while generating a large number of logs, it can also cause transactions not to be accepted, as mentioned in https://github.com/getsentry/self-hosted/issues/2876