redpanda
redpanda copied to clipboard
`Assert failure: (../../../src/v/raft/vote_stm.cc:278) '_ptr->_confirmed_term == _ptr->_term'` when a broker restarts after a failure to write to disk
Version & Environment
Redpanda version: rpk version says latest (rev c8d4be2). This image was built by @travisdowns with the version name of v28_df58cced6e47
OS:
uname -a: Linux redpanda-0 5.10.0-11-amd64 #1 SMP Debian 5.10.92-1 (2022-01-18) x86_64 GNU/Linux
/etc/os-release:
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
The issue occurred on a broker running in a self-hosted Kubernetes cluster. We're using a locally built Helm chart based off of the standard Redpanda Helm chart. Dedicated, persistent SSD with XFS for each pod. 2TB disks.
kubectl version:
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.0", GitCommit:"ab69524f795c42094a6630298ff53f3c3ebab7f4", GitTreeState:"clean", BuildDate:"2021-12-07T18:16:20Z", GoVersion:"go1.17.3", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.11", GitCommit:"5824e3251d294d324320db85bf63a53eb0767af2", GitTreeState:"clean", BuildDate:"2022-06-16T05:33:55Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
Statefulset manifest (redacted):
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redpanda
namespace: "redacted"
labels:
helm.sh/chart: redacted
app.kubernetes.io/name: redacted
app.kubernetes.io/instance: "redacted"
app.kubernetes.io/managed-by: "Tiller"
app.kubernetes.io/component: redacted
env: prod
spec:
selector:
matchLabels:
app.kubernetes.io/name: redacted
app.kubernetes.io/instance: "redacted"
serviceName: redpanda
replicas: 32
updateStrategy:
type: OnDelete
podManagementPolicy: "Parallel"
template:
metadata:
labels:
app.kubernetes.io/name: redacted
app.kubernetes.io/instance: "redacted"
app.kubernetes.io/component: redacted
env: prod
spec:
securityContext:
fsGroup: 101
# TODO:
# * Figure out what to do about node_id / seeds here - the operator will fix this separately
# * Once that's done, this initContainer can be removed
initContainers:
- name: redpanda-configurator
image: our-local-docker-hub/vectorized/redpanda:v28_df58cced6e47
command: ["/bin/sh", "-c"]
env:
- name: SERVICE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
args:
- >
CONFIG=/etc/redpanda/redpanda.yaml;
NODE_ID=${SERVICE_NAME##*-};
cp /tmp/base-config/redpanda.yaml "$CONFIG";
echo 1048576 > /proc/sys/fs/aio-max-nr;
rpk --config "$CONFIG" config set redpanda.node_id $NODE_ID;
if [ "$NODE_ID" = "0" ]; then
rpk --config "$CONFIG" config set redpanda.seed_servers '[]' --format yaml;
fi;
volumeMounts:
- name: redpanda
mountPath: /tmp/base-config
- name: config
mountPath: /etc/redpanda
resources:
limits:
cpu: 16
memory: 32Gi
requests:
cpu: 16
memory: 32Gi
containers:
- name: redpanda
image: our-local-docker-hub/vectorized/redpanda:v28_df58cced6e47
env:
- name: SERVICE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
args:
- >
redpanda
start
--smp=32
--memory=250G
--reserve-memory=0M
--advertise-kafka-addr=$(POD_IP):9092
--kafka-addr=$(POD_IP):9092
--rpc-addr=$(POD_IP):33145
--advertise-rpc-addr=$(POD_IP):33145
--default-log-level=error
--blocked-reactor-notify-ms=200
--abort-on-seastar-bad-alloc
--logger-log-level=seastar_memory=trace
--max-networking-io-control-blocks=30000
ports:
- containerPort: 9644
name: admin
- containerPort: 9092
name: kafka
- containerPort: 33145
name: rpc
volumeMounts:
- name: datadir
mountPath: /var/lib/redpanda/data
- name: config
mountPath: /etc/redpanda
resources:
limits:
memory: 256Gi
requests:
cpu: 32
memory: 256Gi
volumes:
- name: datadir
persistentVolumeClaim:
claimName: datadir
- name: redpanda
configMap:
name: redpanda
- name: config
emptyDir: {}
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
topologyKey: kubernetes.io/hostname
labelSelector:
matchLabels:
app.kubernetes.io/name: redacted
app.kubernetes.io/instance: "redacted"
priorityClassName: solidio-localdisk
volumeClaimTemplates:
- metadata:
name: datadir
labels:
app.kubernetes.io/name: redacted
app.kubernetes.io/instance: "redacted"
app.kubernetes.io/component: redacted
env: prod
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: "static-full-disk-xfs"
resources:
requests:
storage: "300Gi"
We're using Python confluent-kafka library (librdkafka-based).
What went wrong?
After running stably for weeks we encountered this assert 3 times (with different values for fallocation_offset and committed_offset) on one broker, followed immediately by that broker crashing.
Assert failure: (../../../src/v/storage/segment_appender.cc:507) 'false' Could not dma_write: std::__1::system_error (error system:5, Input/output error) {no_of_chunks:64, closed:0, fallocation_offset:33554432, committed_offset:11748024, bytes_flush_pending:0}
The root cause of this write failure is not known. We suspect that it was caused by an issue on the host system, not within the Redpanda container.
Kubernetes restarted the pod using the same PV. But the broker was unable to recover and began crashlooping. Each crash was caused by the following assert/callstack:
ERROR 2022-11-20 03:33:13,754782 [shard 6 seq 1] assert - Assert failure: (../../../src/v/raft/vote_stm.cc:278) '_ptr->_confirmed_term == _ptr->_term' successfully replicated configuration should update _confirmed_term=-9223372036854775808 to be equal to _term=43
ERROR 2022-11-20 03:33:13,754867 [shard 6 seq 2] assert - Backtrace below:
0x4d0b484 0x1ea0e59 0x4abc1df 0x4abfeb7 0x4b033b5 0x4a5d19f /opt/redpanda/lib/libpthread.so.0+0x8608 /opt/redpanda/lib/libc.so.6+0x11f132
--------
seastar::continuation<seastar::internal::promise_base_with_type<void>, raft::vote_stm::update_vote_state(seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock>)::$_6, seastar::future<void> seastar::future<std::__1::error_code>::then_impl_nrvo<raft::vote_stm::update_vote_state(seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock>)::$_6, seastar::future<void> >(raft::vote_stm::update_vote_state(seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock>)::$_6&&)::'lambda'(seastar::internal::promise_base_with_type<void>&&, raft::vote_stm::update_vote_state(seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock>)::$_6&, seastar::future_state<std::__1::error_code>&&), std::__1::error_code>
--------
seastar::continuation<seastar::internal::promise_base_with_type<void>, raft::consensus::dispatch_vote(bool)::$_11::operator()() const::'lambda'(bool)::operator()(bool)::'lambda'(seastar::future<void>), seastar::futurize<seastar::future<void> >::type seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, raft::consensus::dispatch_vote(bool)::$_11::operator()() const::'lambda'(bool)::operator()(bool)::'lambda'(seastar::future<void>)>(raft::consensus::dispatch_vote(bool)::$_11::operator()() const::'lambda'(bool)::operator()(bool)::'lambda'(seastar::future<void>)&&)::'lambda'(seastar::internal::promise_base_with_type<void>&&, raft::consensus::dispatch_vote(bool)::$_11::operator()() const::'lambda'(bool)::operator()(bool)::'lambda'(seastar::future<void>)&, seastar::future_state<seastar::internal::monostate>&&), void>
--------
seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::future<void>::finally_body<raft::consensus::dispatch_vote(bool)::$_11::operator()() const::'lambda'(), false>, seastar::futurize<seastar::future<void> >::type seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::future<void>::finally_body<raft::consensus::dispatch_vote(bool)::$_11::operator()() const::'lambda'(), false> >(seastar::future<void>::finally_body<raft::consensus::dispatch_vote(bool)::$_11::operator()() const::'lambda'(), false>&&)::'lambda'(seastar::internal::promise_base_with_type<void>&&, seastar::future<void>::finally_body<raft::consensus::dispatch_vote(bool)::$_11::operator()() const::'lambda'(), false>&, seastar::future_state<seastar::internal::monostate>&&), void>
--------
seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::future<void>::finally_body<auto seastar::internal::invoke_func_with_gate<raft::consensus::dispatch_vote(bool)::$_11>(seastar::gate&, raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(), false>, seastar::futurize<raft::consensus::dispatch_vote(bool)::$_11>::type seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::future<void>::finally_body<auto seastar::internal::invoke_func_with_gate<raft::consensus::dispatch_vote(bool)::$_11>(seastar::gate&, raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(), false> >(seastar::future<void>::finally_body<auto seastar::internal::invoke_func_with_gate<raft::consensus::dispatch_vote(bool)::$_11>(seastar::gate&, raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(), false>&&)::'lambda'(seastar::internal::promise_base_with_type<void>&&, seastar::future<void>::finally_body<auto seastar::internal::invoke_func_with_gate<raft::consensus::dispatch_vote(bool)::$_11>(seastar::gate&, raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(), false>&, seastar::future_state<seastar::internal::monostate>&&), void>
--------
seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::future<void> seastar::future<void>::handle_exception_type<auto ssx::spawn_with_gate_then<raft::consensus::dispatch_vote(bool)::$_11>(seastar::gate&, raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(seastar::abort_requested_exception const&)>(raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(raft::consensus::dispatch_vote(bool)::$_11&&), seastar::futurize<raft::consensus::dispatch_vote(bool)::$_11>::type seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::future<void> seastar::future<void>::handle_exception_type<auto ssx::spawn_with_gate_then<raft::consensus::dispatch_vote(bool)::$_11>(seastar::gate&, raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(seastar::abort_requested_exception const&)>(raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(raft::consensus::dispatch_vote(bool)::$_11&&)>(seastar::future<void> seastar::future<void>::handle_exception_type<auto ssx::spawn_with_gate_then<raft::consensus::dispatch_vote(bool)::$_11>(seastar::gate&, raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(seastar::abort_requested_exception const&)>(raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(raft::consensus::dispatch_vote(bool)::$_11&&)&&)::'lambda'(seastar::internal::promise_base_with_type<void>&&, seastar::future<void> seastar::future<void>::handle_exception_type<auto ssx::spawn_with_gate_then<raft::consensus::dispatch_vote(bool)::$_11>(seastar::gate&, raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(seastar::abort_requested_exception const&)>(raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(raft::consensus::dispatch_vote(bool)::$_11&&)&, seastar::future_state<seastar::internal::monostate>&&), void>
--------
seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::future<void> seastar::future<void>::handle_exception_type<auto ssx::spawn_with_gate_then<raft::consensus::dispatch_vote(bool)::$_11>(seastar::gate&, raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(seastar::gate_closed_exception const&)>(raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(raft::consensus::dispatch_vote(bool)::$_11&&), seastar::futurize<raft::consensus::dispatch_vote(bool)::$_11>::type seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::future<void> seastar::future<void>::handle_exception_type<auto ssx::spawn_with_gate_then<raft::consensus::dispatch_vote(bool)::$_11>(seastar::gate&, raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(seastar::gate_closed_exception const&)>(raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(raft::consensus::dispatch_vote(bool)::$_11&&)>(seastar::future<void> seastar::future<void>::handle_exception_type<auto ssx::spawn_with_gate_then<raft::consensus::dispatch_vote(bool)::$_11>(seastar::gate&, raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(seastar::gate_closed_exception const&)>(raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(raft::consensus::dispatch_vote(bool)::$_11&&)&&)::'lambda'(seastar::internal::promise_base_with_type<void>&&, seastar::future<void> seastar::future<void>::handle_exception_type<auto ssx::spawn_with_gate_then<raft::consensus::dispatch_vote(bool)::$_11>(seastar::gate&, raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(seastar::gate_closed_exception const&)>(raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(raft::consensus::dispatch_vote(bool)::$_11&&)&, seastar::future_state<seastar::internal::monostate>&&), void>
--------
seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::future<void> seastar::future<void>::handle_exception_type<auto ssx::spawn_with_gate_then<raft::consensus::dispatch_vote(bool)::$_11>(seastar::gate&, raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(seastar::broken_semaphore const&)>(raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(raft::consensus::dispatch_vote(bool)::$_11&&), seastar::futurize<raft::consensus::dispatch_vote(bool)::$_11>::type seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::future<void> seastar::future<void>::handle_exception_type<auto ssx::spawn_with_gate_then<raft::consensus::dispatch_vote(bool)::$_11>(seastar::gate&, raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(seastar::broken_semaphore const&)>(raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(raft::consensus::dispatch_vote(bool)::$_11&&)>(seastar::future<void> seastar::future<void>::handle_exception_type<auto ssx::spawn_with_gate_then<raft::consensus::dispatch_vote(bool)::$_11>(seastar::gate&, raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(seastar::broken_semaphore const&)>(raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(raft::consensus::dispatch_vote(bool)::$_11&&)&&)::'lambda'(seastar::internal::promise_base_with_type<void>&&, seastar::future<void> seastar::future<void>::handle_exception_type<auto ssx::spawn_with_gate_then<raft::consensus::dispatch_vote(bool)::$_11>(seastar::gate&, raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(seastar::broken_semaphore const&)>(raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(raft::consensus::dispatch_vote(bool)::$_11&&)&, seastar::future_state<seastar::internal::monostate>&&), void>
--------
seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::future<void> seastar::future<void>::handle_exception_type<auto ssx::spawn_with_gate_then<raft::consensus::dispatch_vote(bool)::$_11>(seastar::gate&, raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(seastar::broken_condition_variable const&)>(raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(raft::consensus::dispatch_vote(bool)::$_11&&), seastar::futurize<raft::consensus::dispatch_vote(bool)::$_11>::type seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::future<void> seastar::future<void>::handle_exception_type<auto ssx::spawn_with_gate_then<raft::consensus::dispatch_vote(bool)::$_11>(seastar::gate&, raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(seastar::broken_condition_variable const&)>(raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(raft::consensus::dispatch_vote(bool)::$_11&&)>(seastar::future<void> seastar::future<void>::handle_exception_type<auto ssx::spawn_with_gate_then<raft::consensus::dispatch_vote(bool)::$_11>(seastar::gate&, raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(seastar::broken_condition_variable const&)>(raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(raft::consensus::dispatch_vote(bool)::$_11&&)&&)::'lambda'(seastar::internal::promise_base_with_type<void>&&, seastar::future<void> seastar::future<void>::handle_exception_type<auto ssx::spawn_with_gate_then<raft::consensus::dispatch_vote(bool)::$_11>(seastar::gate&, raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(seastar::broken_condition_variable const&)>(raft::consensus::dispatch_vote(bool)::$_11&&)::'lambda'(raft::consensus::dispatch_vote(bool)::$_11&&)&, seastar::future_state<seastar::internal::monostate>&&), void>
--------
seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::future<void> seastar::future<void>::handle_exception<raft::consensus::dispatch_vote(bool)::$_51>(raft::consensus::dispatch_vote(bool)::$_51&&)::'lambda'(raft::consensus::dispatch_vote(bool)::$_51&&), seastar::futurize<raft::consensus::dispatch_vote(bool)::$_51>::type seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::future<void> seastar::future<void>::handle_exception<raft::consensus::dispatch_vote(bool)::$_51>(raft::consensus::dispatch_vote(bool)::$_51&&)::'lambda'(raft::consensus::dispatch_vote(bool)::$_51&&)>(seastar::future<void> seastar::future<void>::handle_exception<raft::consensus::dispatch_vote(bool)::$_51>(raft::consensus::dispatch_vote(bool)::$_51&&)::'lambda'(raft::consensus::dispatch_vote(bool)::$_51&&)&&)::'lambda'(seastar::internal::promise_base_with_type<void>&&, seastar::future<void> seastar::future<void>::handle_exception<raft::consensus::dispatch_vote(bool)::$_51>(raft::consensus::dispatch_vote(bool)::$_51&&)::'lambda'(raft::consensus::dispatch_vote(bool)::$_51&&)&, seastar::future_state<seastar::internal::monostate>&&), void>
I was able to stop the crashlooping by putting the broker in maintenance mode. It was then able to join the cluster and remain as a "healthy" member. We have not tried to disable maintenance mode on that broker.
What should have happened instead?
After the failure to write and the pod being restarted Redpanda should have been able to recover and the broker should have rejoined the cluster correctly.
How to reproduce the issue?
Unknown.
Additional information
All metrics up to the point of the initial crash for that broker were in line with other brokers in the cluster. Slack thread with some discussion.
I'm really interested to know why the broker was unable to recover and what impact we might have from disabling maintenance mode on the broker.
SolidIO
Can you say more about what this is? I googled "SolidIO", "SolidIO storage" etc and couldn't find any references at all.
As it turns out that's the name of the internally developed persistent disk management we use within Kubernetes (thought it was external). It just handles PV lifetimes and such. You can ignore it.
We made an attempt to bring the broker out of maintenance mode. We can see from metrics that leadership of partitions was transferred to it. But it started hitting the same assert again and we had to put it back into maintenance mode to prevent further impact.