self-hosted icon indicating copy to clipboard operation
self-hosted copied to clipboard

Get Connection refused error when one clickhouse server is down in a replicated clickhouse cluster

Open YukJiSoo opened this issue 1 year ago • 0 comments

Self-Hosted Version

22.11.0

CPU Architecture

x86_64

Docker Version

20.10.17

Docker Compose Version

2.9.0

Steps to Reproduce

  1. Down a clickhouse server
  2. Can check logs in snuba container

Expected Result

  • Clickhouse servers are clustered, so it works fine.

Actual Result

  • Get errors in snuba, data is not saved normally.
sentry-self-hosted-snuba-transactions-consumer-1               | 2024-02-23 06:50:02,072 Terminating <snuba.consumers.consumer.ProcessedMessageBatchWriter object at 0x7fc8f93e4820>...
sentry-self-hosted-snuba-transactions-consumer-1               | 2024-02-23 06:50:02,073 Closing <snuba.utils.streams.kafka_consumer_with_commit_log.KafkaConsumerWithCommitLog object at 0x7fc8f9509dc0>...
sentry-self-hosted-snuba-transactions-consumer-1               | 2024-02-23 06:50:02,073 Partitions revoked: [Partition(topic=Topic(name='transactions'), index=7), Partition(topic=Topic(name='transactions'), index=8), Partition(topic=Topic(name='transactions'), index=9), Partition(topic=Topic(name='transactions'), index=10), Partition(topic=Topic(name='transactions'), index=11), Partition(topic=Topic(name='transactions'), index=12), Partition(topic=Topic(name='transactions'), index=13)]
sentry-self-hosted-snuba-transactions-consumer-1               | 2024-02-23 06:50:02,083 Processor terminated
sentry-self-hosted-snuba-transactions-consumer-1               | Traceback (most recent call last):
sentry-self-hosted-snuba-transactions-consumer-1               |   File "/usr/local/bin/snuba", line 33, in <module>
sentry-self-hosted-snuba-transactions-consumer-1               |     sys.exit(load_entry_point('snuba', 'console_scripts', 'snuba')())
sentry-self-hosted-snuba-transactions-consumer-1               |   File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
sentry-self-hosted-snuba-transactions-consumer-1               |     return self.main(*args, **kwargs)
sentry-self-hosted-snuba-transactions-consumer-1               |   File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1055, in main
sentry-self-hosted-snuba-transactions-consumer-1               |     rv = self.invoke(ctx)
sentry-self-hosted-snuba-transactions-consumer-1               |   File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
sentry-self-hosted-snuba-transactions-consumer-1               |     return _process_result(sub_ctx.command.invoke(sub_ctx))
sentry-self-hosted-snuba-transactions-consumer-1               |   File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
sentry-self-hosted-snuba-transactions-consumer-1               |     return ctx.invoke(self.callback, **ctx.params)
sentry-self-hosted-snuba-transactions-consumer-1               |   File "/usr/local/lib/python3.8/site-packages/click/core.py", line 760, in invoke
sentry-self-hosted-snuba-transactions-consumer-1               |     return __callback(*args, **kwargs)
sentry-self-hosted-snuba-transactions-consumer-1               |   File "/usr/src/snuba/snuba/cli/consumer.py", line 202, in consumer
sentry-self-hosted-snuba-transactions-consumer-1               |     consumer.run()
sentry-self-hosted-snuba-transactions-consumer-1               |   File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 180, in run
sentry-self-hosted-snuba-transactions-consumer-1               |     self._run_once()
sentry-self-hosted-snuba-transactions-consumer-1               |   File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 220, in _run_once
sentry-self-hosted-snuba-transactions-consumer-1               |     self.__processing_strategy.poll()
sentry-self-hosted-snuba-transactions-consumer-1               |   File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/transform.py", line 63, in poll
sentry-self-hosted-snuba-transactions-consumer-1               |     self.__next_step.poll()
sentry-self-hosted-snuba-transactions-consumer-1               |   File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/collect.py", line 135, in poll
sentry-self-hosted-snuba-transactions-consumer-1               |     self.close_and_reset_batch()
sentry-self-hosted-snuba-transactions-consumer-1               |   File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/collect.py", line 201, in close_and_reset_batch
sentry-self-hosted-snuba-transactions-consumer-1               |     self.future.result(timeout=self.wait_timeout)
sentry-self-hosted-snuba-transactions-consumer-1               |   File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 437, in result
sentry-self-hosted-snuba-transactions-consumer-1               |     return self.__get_result()
sentry-self-hosted-snuba-transactions-consumer-1               |   File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
sentry-self-hosted-snuba-transactions-consumer-1               |     raise self._exception
sentry-self-hosted-snuba-transactions-consumer-1               |   File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
sentry-self-hosted-snuba-transactions-consumer-1               |     result = self.fn(*self.args, **self.kwargs)
sentry-self-hosted-snuba-transactions-consumer-1               |   File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/collect.py", line 213, in __finish_batch
sentry-self-hosted-snuba-transactions-consumer-1               |     batch.close()
sentry-self-hosted-snuba-transactions-consumer-1               |   File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/collect.py", line 66, in close
sentry-self-hosted-snuba-transactions-consumer-1               |     self.__step.close()
sentry-self-hosted-snuba-transactions-consumer-1               |   File "/usr/src/snuba/snuba/consumers/consumer.py", line 288, in close
sentry-self-hosted-snuba-transactions-consumer-1               |     self.__insert_batch_writer.close()
sentry-self-hosted-snuba-transactions-consumer-1               |   File "/usr/src/snuba/snuba/consumers/consumer.py", line 104, in close
sentry-self-hosted-snuba-transactions-consumer-1               |     self.__writer.write(
sentry-self-hosted-snuba-transactions-consumer-1               |   File "/usr/src/snuba/snuba/clickhouse/http.py", line 274, in write
sentry-self-hosted-snuba-transactions-consumer-1               |     batch.join()
sentry-self-hosted-snuba-transactions-consumer-1               |   File "/usr/src/snuba/snuba/clickhouse/http.py", line 224, in join
sentry-self-hosted-snuba-transactions-consumer-1               |     raise HTTPError(
sentry-self-hosted-snuba-transactions-consumer-1               | urllib3.exceptions.HTTPError: Received unexpected 500 response: Code: 210. DB::NetException: Connection refused (rc-22-clickhouse001-myteam-jp2v-prod.corpinfra.com:9000): Insertion status:
sentry-self-hosted-snuba-transactions-consumer-1               | Wrote 0 blocks and 0 rows on shard 0 replica 0, rc-22-clickhouse001-myteam-jp2v-prod.corpinfra.com:9000 (average 0 ms per block)
sentry-self-hosted-snuba-transactions-consumer-1               | Wrote 1 blocks and 112 rows on shard 0 replica 1, rc-22-clickhouse002-myteam-jp2v-prod.corpinfra.com:9000 (average 4 ms per block)
sentry-self-hosted-snuba-transactions-consumer-1               | Wrote 1 blocks and 230 rows on shard 1 replica 0, rc-22-clickhouse003-myteam-jp2v-prod.corpinfra.com:9000 (average 4 ms per block)
sentry-self-hosted-snuba-transactions-consumer-1               | Wrote 1 blocks and 230 rows on shard 1 replica 1, rc-22-clickhouse004-myteam-jp2v-prod.corpinfra.com:9000 (average 4 ms per block)
sentry-self-hosted-snuba-transactions-consumer-1               | Wrote 1 blocks and 92 rows on shard 2 replica 0, rc-22-clickhouse005-myteam-jp2v-prod.corpinfra.com:9000 (average 5 ms per block)
sentry-self-hosted-snuba-transactions-consumer-1               | Wrote 1 blocks and 92 rows on shard 2 replica 0, rc-22-clickhouse005-myteam-jp2v-prod.corpinfra.com:9000 (average 1 ms per block)
sentry-self-hosted-snuba-transactions-consumer-1               | . (NETWORK_ERROR) (version 22.1.3.7 (official build))
  • After downing the server, below issues occurred
    • Issue inflow seems to be fine for some time after the server is down
    • After about 5 minutes, issue inflow slowed down (3-4s -> 6-9s)
    • CPU usage spikes on snuba, memory usage drops slightly
    • Kafka topic metrics/disk usage spikes
    • CPU/Disk IO spikes on Clickhouse and Clickhouse Zookeeper

Event ID

No response

YukJiSoo avatar Feb 23 '24 07:02 YukJiSoo