self-hosted
self-hosted copied to clipboard
Sentry stopped accepting transaction data
Self-Hosted Version
24.3.0.dev0
CPU Architecture
x86_x64
Docker Version
24.0.4
Docker Compose Version
24.0.4
Steps to Reproduce
Update to the latest master
Expected Result
Everything works fine
Actual Result
Performance page shows zeros for the time period since the update and until now:
Project page shows the correct info about transactions and errors:
Stats page shows 49k transactions of which 49k are dropped:
Same for errors:
Event ID
No response
UPD
there are a lot of errors in clickhouse container:
2024.03.10 23:40:34.789282 [ 46 ] {} <Error> ServerErrorHandler: Poco::Exception. Code: 1000, e.code() = 107, e.displayText() = Net Exception: Socket is not connected, Stack trace (when copying this message, always include the lines below):
0. Poco::Net::SocketImpl::error(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0x13c4ee8e in /usr/bin/clickhouse
1. Poco::Net::SocketImpl::peerAddress() @ 0x13c510d6 in /usr/bin/clickhouse
2. DB::ReadBufferFromPocoSocket::ReadBufferFromPocoSocket(Poco::Net::Socket&, unsigned long) @ 0x101540cd in /usr/bin/clickhouse
3. DB::HTTPServerRequest::HTTPServerRequest(std::__1::shared_ptr<DB::Context const>, DB::HTTPServerResponse&, Poco::Net::HTTPServerSession&) @ 0x110e6fd5 in /usr/bin/clickhouse
4. DB::HTTPServerConnection::run() @ 0x110e5d6e in /usr/bin/clickhouse
5. Poco::Net::TCPServerConnection::start() @ 0x13c5614f in /usr/bin/clickhouse
6. Poco::Net::TCPServerDispatcher::run() @ 0x13c57bda in /usr/bin/clickhouse
7. Poco::PooledThread::run() @ 0x13d89e59 in /usr/bin/clickhouse
8. Poco::ThreadImpl::runnableEntry(void*) @ 0x13d860ea in /usr/bin/clickhouse
9. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
10. clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
(version 21.8.13.1.altinitystable (altinity build))
Also, for some reason Sentry started dropping incoming errors some time ago (as if I was using saas sentry):
Did you change the port? I had the same situation when I changed the port.
Yes, I have the relay port exposed to the host network. How did you manage to fix the problem?
Yes, I have the relay port exposed to the host network. How did you manage to fix the problem?
When I reverted the port change the problem was resolved.
Nope, didn't help. Doesn't work even with default config. Thanks for the tip though
Are there any logs in your web container that can help? Are you sure you are receiving the event envelopes? You should be able to see that activity in your nginx container.
Same here, on the browser side, there is a request sent with an event type of "transaction", but there is no data displayed under "performance", and the number of transactions in the project is also 0.
Same here, on the browser side, there is a request sent with an event type of "transaction", but there is no data displayed under "performance", and the number of transactions in the project is also 0.
Problem solved, server time not match the sdk time.
I can see that there are successful requests to /api/2/envelope:
Also I can see transaction statistics on the projects page:
Number 394k for the last 24 hours is about right.
Are you on a nightly version of self-hosted? What does your sentry.conf.py look like? We've added some feature flags there to support the new performance features
I'm using docker with the latest commit from this repository. Bottom of the page says Sentry 24.3.0.dev0 unknown
. So I guess that's nightly.
I've updated sentry.conf.py to match the most recent version from this repo - now the only difference is in SENTRY_SINGLE_ORGANIZATION
and CSRF_TRUSTED_ORIGINS
variables.
After that, errors have also disappeared:
I can confirm that the clickhouse errors are due to the Rust workers, reverting the workers part of #2831 and #2861 made the errors disappear. But still I have a too high dropping of transactions since the upgrade.
Worker code: https://github.com/getsentry/snuba/blob/359878fbe030a63945914ef05e705224680b453c/rust_snuba/src/strategies/clickhouse.rs#L61
Workers logs show that insert is done (is it ?): "timestamp":"2024-03-16T11:40:52.491448Z","level":"INFO","fields":{"message":"Inserted 29 rows"},
The error is caused by connection being prematurely closed. See https://github.com/getsentry/self-hosted/issues/2900
Same issue on latest 24.3.0
errors are not logged to
Okay so I'm able to replicate this issue on my instance (24.3.0). What happen is that Sentry does accept transaction/errors/profiles/replays/attachments data, but it doesn't record it on the statistics. So your stats of ingested events might be displayed as is there were no events being recorded, but actually the events are there -- it's processed by Snuba and you can view it on the web UI.
Can anyone reading this confirm that that's what happened on your instances as well? (I don't want to ping everybody)
If the answer to that 👆🏻 is "yes", that means something (a module, container, or something) that ingest the events didn't do data insertion correctly for it to be queried as statistics. I don't know for sure whether it's the responsibility of Snuba consumers (as we moved to rust-consumer
just on 24.3.0) or Sentry workers, but I'd assume it's Snuba consumers.
A few solution (well not really but I hope this would get rid of this issue) for this is, either:
- Fix the issue somewhere, cut a patch release.
- If it's caused by
rust-consumer
s, then we might rollback the usage ofrust-consumer
and just go back to old Python ones.
Okay so I'm able to replicate this issue on my instance (24.3.0). What happen is that Sentry does accept transaction/errors/profiles/replays/attachments data, but it doesn't record it on the statistics. So your stats of ingested events might be displayed as is there were no events being recorded, but actually the events are there -- it's processed by Snuba and you can view it on the web UI.
Can anyone reading this confirm that that's what happened on your instances as well? (I don't want to ping everybody)
If the answer to that 👆🏻 is "yes", that means something (a module, container, or something) that ingest the events didn't do data insertion correctly for it to be queried as statistics. I don't know for sure whether it's the responsibility of Snuba consumers (as we moved to
rust-consumer
just on 24.3.0) or Sentry workers, but I'd assume it's Snuba consumers.A few solution (well not really but I hope this would get rid of this issue) for this is, either:
- Fix the issue somewhere, cut a patch release.
- If it's caused by
rust-consumer
s, then we might rollback the usage ofrust-consumer
and just go back to old Python ones.
I didn't see any errors in the Issues tab. I had to rebuild a Server Snapshot to “fix” this problem. So it wasn't just the statistics that were affected.
Okay so I'm able to replicate this issue on my instance (24.3.0). What happen is that Sentry does accept transaction/errors/profiles/replays/attachments data, but it doesn't record it on the statistics. So your stats of ingested events might be displayed as is there were no events being recorded, but actually the events are there -- it's processed by Snuba and you can view it on the web UI.
Looks like that confirms my Clickhouse stats
But the workers seem to exit for some odd reasons. So I have to restart them when the stats show that there is no row inserted.
If it's caused by rust-consumers, then we might rollback the usage of rust-consumer and just go back to old Python ones.
It is not rust consumers, because I am had same issues with 24.2.0 version, which had python consumers
As another data point, it appears that our Sentry instance is correctly ingesting events. However, the Stats page is showing 0 accepted/filtered/dropped since the day that rust consumers were merged into master
Hopefully this PR solves the Stats page issue.
https://github.com/getsentry/self-hosted/pull/2908
@hubertdeng123 errors also is not show on sentry UI after some time (gaps with no data - after this I restart server and it start accept again)
And same issue with 24.2.0. I was migrated to 24.3.0 thinking rust consumers will fix this issue, but looks like nope
Chatted with Hubert over Discord, he's saying that on Sentry's selfhosted dogfood instance (probably self-hosted.getsentry.dev) they're able to ingest events, and it's showing, but the stats is 0 since the rust-consumer is merged to master (they uses nightly).
Quoting:
I've followed up with the owners of the rust-consumers for more details, on our instance it seems to correspond to the exact days where rust-consumers were merged to master
Might need some time to find out what's wrong.
@hubertdeng123 errors also is not show on sentry UI after some time (gaps with no data - after this I restart server and it start accept again)
![]()
And same issue with 24.2.0. I was migrated to 24.3.0 thinking rust consumers will fix this issue, but looks like nope
Could it be that you have Kafka hiccups on your machine? Can you see topic lags on your Kafka container?
I did not have this issues until latest updates to new versions (no hardware change from my side). I see, that latest versions increase number of consumers
Initially I was thinking it happening because of java defunct processes, which grow over time https://github.com/getsentry/self-hosted/issues/2567#issuecomment-1997858542 , but reverting helfcheck to previous version removed this defunct java processes, but after this stop showing transactions and errors happend again yesterday.
I am ready to check if Kafka have hickups, just provide me with commands ) @aldy505
BTW on CPU I don't see big load, maybe not enough memory (16Gbon machine), but this amount close to 100% consumed when docker compose start everything
It is not rust consumers
For the exception reported on the issue post it is due to the rust workers. Does anyone know why the python workers did not have this too quick http closing error?
I have the same error, with 24.3.0 sentry is nonfunctional, i will revert to backup now.
2024.03.25 13:15:44.444996 [ 47 ] {} <Error> ServerErrorHandler: Poco::Exception. Code: 1000, e.code() = 107, e.displayText() = Net Exception: Socket is not connected, Stack trace (when copying this message, always include the lines below):
0. Poco::Net::SocketImpl::error(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0x13c4ee8e in /usr/bin/clickhouse
1. Poco::Net::SocketImpl::peerAddress() @ 0x13c510d6 in /usr/bin/clickhouse
2. DB::HTTPServerRequest::HTTPServerRequest(std::__1::shared_ptr<DB::Context const>, DB::HTTPServerResponse&, Poco::Net::HTTPServerSession&) @ 0x110e6f0b in /usr/bin/clickhouse
3. DB::HTTPServerConnection::run() @ 0x110e5d6e in /usr/bin/clickhouse
4. Poco::Net::TCPServerConnection::start() @ 0x13c5614f in /usr/bin/clickhouse
5. Poco::Net::TCPServerDispatcher::run() @ 0x13c57bda in /usr/bin/clickhouse
6. Poco::PooledThread::run() @ 0x13d89e59 in /usr/bin/clickhouse
7. Poco::ThreadImpl::runnableEntry(void*) @ 0x13d860ea in /usr/bin/clickhouse
8. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
9. clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
(version 21.8.13.1.altinitystable (altinity build))
2024.03.25 13:15:45.469271 [ 47 ] {} <Error> ServerErrorHandler: Poco::Exception. Code: 1000, e.code() = 107, e.displayText() = Net Exception: Socket is not connected, Stack trace (when copying this message, always include the lines below):
0. Poco::Net::SocketImpl::error(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0x13c4ee8e in /usr/bin/clickhouse
1. Poco::Net::SocketImpl::peerAddress() @ 0x13c510d6 in /usr/bin/clickhouse
2. DB::HTTPServerRequest::HTTPServerRequest(std::__1::shared_ptr<DB::Context const>, DB::HTTPServerResponse&, Poco::Net::HTTPServerSession&) @ 0x110e6f0b in /usr/bin/clickhouse
3. DB::HTTPServerConnection::run() @ 0x110e5d6e in /usr/bin/clickhouse
4. Poco::Net::TCPServerConnection::start() @ 0x13c5614f in /usr/bin/clickhouse
5. Poco::Net::TCPServerDispatcher::run() @ 0x13c57bda in /usr/bin/clickhouse
6. Poco::PooledThread::run() @ 0x13d89e59 in /usr/bin/clickhouse
7. Poco::ThreadImpl::runnableEntry(void*) @ 0x13d860ea in /usr/bin/clickhouse
8. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
9. clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
(version 21.8.13.1.altinitystable (altinity build))
EDIT: switching to non rust consumers, fixed the problem. But the stats view is broken now
I have the same error, with 24.3.0 sentry is nonfunctional, i will revert to backup now.
2024.03.25 13:15:44.444996 [ 47 ] {} <Error> ServerErrorHandler: Poco::Exception. Code: 1000, e.code() = 107, e.displayText() = Net Exception: Socket is not connected, Stack trace (when copying this message, always include the lines below): 0. Poco::Net::SocketImpl::error(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0x13c4ee8e in /usr/bin/clickhouse 1. Poco::Net::SocketImpl::peerAddress() @ 0x13c510d6 in /usr/bin/clickhouse 2. DB::HTTPServerRequest::HTTPServerRequest(std::__1::shared_ptr<DB::Context const>, DB::HTTPServerResponse&, Poco::Net::HTTPServerSession&) @ 0x110e6f0b in /usr/bin/clickhouse 3. DB::HTTPServerConnection::run() @ 0x110e5d6e in /usr/bin/clickhouse 4. Poco::Net::TCPServerConnection::start() @ 0x13c5614f in /usr/bin/clickhouse 5. Poco::Net::TCPServerDispatcher::run() @ 0x13c57bda in /usr/bin/clickhouse 6. Poco::PooledThread::run() @ 0x13d89e59 in /usr/bin/clickhouse 7. Poco::ThreadImpl::runnableEntry(void*) @ 0x13d860ea in /usr/bin/clickhouse 8. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so 9. clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so (version 21.8.13.1.altinitystable (altinity build)) 2024.03.25 13:15:45.469271 [ 47 ] {} <Error> ServerErrorHandler: Poco::Exception. Code: 1000, e.code() = 107, e.displayText() = Net Exception: Socket is not connected, Stack trace (when copying this message, always include the lines below): 0. Poco::Net::SocketImpl::error(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0x13c4ee8e in /usr/bin/clickhouse 1. Poco::Net::SocketImpl::peerAddress() @ 0x13c510d6 in /usr/bin/clickhouse 2. DB::HTTPServerRequest::HTTPServerRequest(std::__1::shared_ptr<DB::Context const>, DB::HTTPServerResponse&, Poco::Net::HTTPServerSession&) @ 0x110e6f0b in /usr/bin/clickhouse 3. DB::HTTPServerConnection::run() @ 0x110e5d6e in /usr/bin/clickhouse 4. Poco::Net::TCPServerConnection::start() @ 0x13c5614f in /usr/bin/clickhouse 5. Poco::Net::TCPServerDispatcher::run() @ 0x13c57bda in /usr/bin/clickhouse 6. Poco::PooledThread::run() @ 0x13d89e59 in /usr/bin/clickhouse 7. Poco::ThreadImpl::runnableEntry(void*) @ 0x13d860ea in /usr/bin/clickhouse 8. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so 9. clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so (version 21.8.13.1.altinitystable (altinity build))
EDIT: switching to non rust consumers, fixed the problem. But the stats view is broken now
@DarkByteZero for the broken stats view, try to add one more snuba consumer from this PR https://github.com/getsentry/self-hosted/pull/2909
After using the new snuba consumer from https://github.com/getsentry/self-hosted/pull/2909 and reverting to the python consumer, everything is working now. My statistics view is now complete again, even retroactively.
I have the same problem and sentry no longer accepts new issues.