self-hosted
self-hosted copied to clipboard
sentry-self-hosted-post-process-forwarder-errors-1 is constantly restarting - update from 23.6.2 to 24.4.2
Self-Hosted Version
24.4.2
CPU Architecture
x86_64
Docker Version
24.0.2
Docker Compose Version
2.18.1
Steps to Reproduce
- wget https://github.com/getsentry/self-hosted/archive/refs/tags/24.4.2.tar.gz
- tar -zxvf 24.4.2.tar.gz
- mv self-hosted-24.4.2 sentry
- cd sentry
- ./install.sh
- docker compose up -d
Expected Result
All sentry containers is running.
Actual Result
Instance sentry-self-hosted-post-process-forwarder-errors-1 continues to restart.
log:
Updating certificates in /etc/ssl/certs...
0 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...
done.
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/sentry/consumers/synchronized.py", line 177, in __check_commit_log_worker_running
self.__commit_log_worker.result()
File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 449, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/usr/local/lib/python3.11/site-packages/arroyo/utils/concurrent.py", line 31, in run
result = function()
^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/sentry/consumers/synchronized.py", line 137, in __run_commit_log_worker
commit = commit_codec.decode(message.payload)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/arroyo/backends/kafka/commit.py", line 51, in decode
return self.decode_legacy(value)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/arroyo/backends/kafka/commit.py", line 84, in decode_legacy
headers["orig_message_ts"].decode("utf-8"), DATETIME_FORMAT
~~~~~~~^^^^^^^^^^^^^^^^^^^
KeyError: 'orig_message_ts'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/arroyo/processing/processor.py", line 322, in run
self._run_once()
File "/usr/local/lib/python3.11/site-packages/arroyo/processing/processor.py", line 384, in _run_once
self.__message = self.__consumer.poll(timeout=1.0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/sentry/consumers/synchronized.py", line 211, in poll
self.__check_commit_log_worker_running()
File "/usr/local/lib/python3.11/site-packages/sentry/consumers/synchronized.py", line 179, in __check_commit_log_worker_running
raise RuntimeError("commit log consumer thread crashed") from e
RuntimeError: commit log consumer thread crashed
18:58:13 [ERROR] arroyo.processing.processor: Caught exception, shutting down...
18:58:13 [INFO] arroyo.processing.processor: Closing <sentry.consumers.synchronized.SynchronizedConsumer object at 0x7fc06635a310>...
18:58:15 [INFO] arroyo.processing.processor: Processor terminated
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/sentry/consumers/synchronized.py", line 177, in __check_commit_log_worker_running
self.__commit_log_worker.result()
File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 449, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/usr/local/lib/python3.11/site-packages/arroyo/utils/concurrent.py", line 31, in run
result = function()
^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/sentry/consumers/synchronized.py", line 137, in __run_commit_log_worker
commit = commit_codec.decode(message.payload)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/arroyo/backends/kafka/commit.py", line 51, in decode
return self.decode_legacy(value)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/arroyo/backends/kafka/commit.py", line 84, in decode_legacy
headers["orig_message_ts"].decode("utf-8"), DATETIME_FORMAT
~~~~~~~^^^^^^^^^^^^^^^^^^^
KeyError: 'orig_message_ts'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/sentry", line 8, in <module>
sys.exit(main())
^^^^^^
File "/usr/local/lib/python3.11/site-packages/sentry/runner/main.py", line 147, in main
func(**kwargs)
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/sentry/runner/decorators.py", line 83, in inner
return ctx.invoke(f, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/sentry/runner/decorators.py", line 35, in inner
return ctx.invoke(f, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/sentry/runner/commands/run.py", line 386, in basic_consumer
run_processor_with_signals(processor, consumer_name)
File "/usr/local/lib/python3.11/site-packages/sentry/utils/kafka.py", line 46, in run_processor_with_signals
processor.run()
File "/usr/local/lib/python3.11/site-packages/arroyo/processing/processor.py", line 322, in run
self._run_once()
File "/usr/local/lib/python3.11/site-packages/arroyo/processing/processor.py", line 384, in _run_once
self.__message = self.__consumer.poll(timeout=1.0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/sentry/consumers/synchronized.py", line 211, in poll
self.__check_commit_log_worker_running()
File "/usr/local/lib/python3.11/site-packages/sentry/consumers/synchronized.py", line 179, in __check_commit_log_worker_running
raise RuntimeError("commit log consumer thread crashed") from e
RuntimeError: commit log consumer thread crashed
Event ID
No response
Hello!
Have you try this solution? https://github.com/getsentry/self-hosted/issues/2629#issuecomment-1846261033
Hello, Yes, I added --no-strict-offset-reset to the 3 containers (post-process-forwarder-*) and there is no change, the error is the same. I also tried with replace in the docker-compose.yaml for rust-consumer as you mentioned here - no change.
Okay, I'm gonna (do another wild) guess that you're out of server resource. How's your CPU and RAM stats? Is your CPU too close to 100% usage? Is there at least 1 GB free RAM? Is it possible for you to bump those? On my office instance it's 8 CPU + 16 GB RAM + 12 GB swap. On my community instance it's 6 CPU + 16 GB RAM + 32 GB swap
I know bumping server specs isn't for everyone, but hey, it's a wild guess anyway.
This instance runs with 4 CPU cores and 16GB RAM without swap because it is in Amazon. There are two more instances with absolutely the same parameters and version 24.4.2, which have no problems at all. I increased the resources of the problematic one to 8 CPU cores and 32GB RAM, and there is no change - this container continues to crash with the same error.
I'm also seeing constant crashes of sentry-self-hosted-post-process-forwarder-errors-1 after upgrading 24.4.0 -> 24.4.2
update: I've restored a backup of version 24.4.0 and it still happens.
After some more debugging it seems i've run into #2951 instead.
I am not sure what might be the reason this is happening, but maybe kafka still has some legacy messages that are not processed and there is trouble there? What happens if you remake your kafka volume? Note: This will result in data loss of unprocessed messages.
This issue has gone three weeks without activity. In another week, I will close it.
But! If you comment or otherwise update it, I will reset the clock, and if you remove the label Waiting for: Community, I will leave it alone ... forever!
"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀