foundationdb
foundationdb copied to clipboard
'Replication health' continues to be in the '(Re)initializing automatic data distribution' phase
FDB version 7.2.0 Cluster size: 18 nodes in total, Redundancy mode use three_datacenter, 6 nodes per DC
Here is what I see in fdbcli:
# fdbcli
Using cluster file `/etc/foundationdb/fdb.cluster'.
The database is available, but has issues (type 'status' for more information).
Welcome to the fdbcli. For help, type `help'.
fdb>
fdb>
fdb> status
Using cluster file `/etc/foundationdb/fdb.cluster'.
Configuration:
Redundancy mode - three_datacenter
Storage engine - ssd-2
Encryption at-rest - disabled
Coordinators - 7
Usable Regions - 1
Cluster:
FoundationDB processes - 502
Zones - 502
Machines - 18
Memory availability - 8.0 GB per process on machine with least available
Retransmissions rate - 78 Hz
Fault Tolerance - 3 zones
Server time - 01/16/23 10:34:52
Data:
Replication health - (Re)initializing automatic data distribution
Moving data - unknown (initializing)
Sum of key-value sizes - unknown
Disk space used - 50.634 GB
Operating space:
Storage server - 1697.8 GB free on most full server
Log server - 1679.8 GB free on most full server
Workload:
Read rate - 303 Hz
Write rate - 87 Hz
Transactions started - 104 Hz
Transactions committed - 87 Hz
Conflict rate - 0 Hz
Backup and DR:
Running backups - 0
Running DRs - 0
Client time: 01/16/23 10:34:52
fdb>
The data_distributor process log has the following error:
<Event Severity="20" Time="1673429958.603785" DateTime="2023-01-11T09:39:18Z" Type="Net2RunLoopTrace" ID="0000000000000000" TraceTime="1673429960.710971" Trace="addr2line -e fdbserver.debug -p -C -f -i 0x7faca699d630 0x4680045 0x1794715 0x174c456 0x1765979 0x176b7ac 0x17c6706 0x17c6b73 0x17c6c5b 0x17ce0e5 0x15a3040 0x438bb32 0xc82397 0x7faca65e2555 0xd01412" ThreadID="8092011216532190504" Machine="10.181.159.65:7500" LogGroup="default" Roles="DD" />
# addr2line -e fdbserver.debug -p -C -f -i 0x7faca699d630 0x4680045 0x1794715 0x174c456 0x1765979 0x176b7ac 0x17c6706 0x17c6b73 0x17c6c5b 0x17ce0e5 0x15a3040 0x438bb32 0xc82397 0x7faca65e2555 0xd01412
?? ??:0
free_fastpath at /home/foundationdb_ci/foundationdb_build_output/dbdbdbdbdbdbdbdbdbdbdbdbdbdbdbdb/Jemalloc_project-prefix/src/Jemalloc_project/src/jemalloc.c:3085
(inlined by) free at /home/foundationdb_ci/foundationdb_build_output/dbdbdbdbdbdbdbdbdbdbdbdbdbdbdbdb/Jemalloc_project-prefix/src/Jemalloc_project/src/jemalloc.c:3161
__gnu_cxx::new_allocator<std::_Rb_tree_node<std::string> >::deallocate(std::_Rb_tree_node<std::string>*, unsigned long) at /opt/rh/devtoolset-8/root/usr/include/c++/8/ext/new_allocator.h:125
(inlined by) std::allocator_traits<std::allocator<std::_Rb_tree_node<std::string> > >::deallocate(std::allocator<std::_Rb_tree_node<std::string> >&, std::_Rb_tree_node<std::string>*, unsigned long) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/alloc_traits.h:462
(inlined by) std::_Rb_tree<std::string, std::string, std::_Identity<std::string>, std::less<std::string>, std::allocator<std::string> >::_M_put_node(std::_Rb_tree_node<std::string>*) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_tree.h:603
(inlined by) std::_Rb_tree<std::string, std::string, std::_Identity<std::string>, std::less<std::string>, std::allocator<std::string> >::_M_drop_node(std::_Rb_tree_node<std::string>*) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_tree.h:670
(inlined by) std::_Rb_tree<std::string, std::string, std::_Identity<std::string>, std::less<std::string>, std::allocator<std::string> >::_M_erase(std::_Rb_tree_node<std::string>*) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_tree.h:1874
std::string::_M_rep() const at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/basic_string.h:3322
(inlined by) std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/basic_string.h:3640
(inlined by) void __gnu_cxx::new_allocator<std::_Rb_tree_node<std::string> >::destroy<std::string>(std::string*) at /opt/rh/devtoolset-8/root/usr/include/c++/8/ext/new_allocator.h:140
(inlined by) void std::allocator_traits<std::allocator<std::_Rb_tree_node<std::string> > >::destroy<std::string>(std::allocator<std::_Rb_tree_node<std::string> >&, std::string*) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/alloc_traits.h:487
(inlined by) std::_Rb_tree<std::string, std::string, std::_Identity<std::string>, std::less<std::string>, std::allocator<std::string> >::_M_destroy_node(std::_Rb_tree_node<std::string>*) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_tree.h:661
(inlined by) std::_Rb_tree<std::string, std::string, std::_Identity<std::string>, std::less<std::string>, std::allocator<std::string> >::_M_drop_node(std::_Rb_tree_node<std::string>*) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_tree.h:669
(inlined by) std::_Rb_tree<std::string, std::string, std::_Identity<std::string>, std::less<std::string>, std::allocator<std::string> >::_M_erase(std::_Rb_tree_node<std::string>*) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_tree.h:1874
(inlined by) std::_Rb_tree<std::string, std::string, std::_Identity<std::string>, std::less<std::string>, std::allocator<std::string> >::~_Rb_tree() at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_tree.h:965
(inlined by) std::set<std::string, std::less<std::string>, std::allocator<std::string> >::~set() at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_set.h:281
(inlined by) DDTeamCollection::isValidLocality(Reference<IReplicationPolicy>, LocalityData const&) const at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbserver/DDTeamCollection.actor.cpp:3696
Reference<IReplicationPolicy>::~Reference() at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/include/flow/FastRef.h:125 (discriminator 1)
(inlined by) DDTeamCollection::addBestMachineTeams(int) at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbserver/DDTeamCollection.actor.cpp:4068 (discriminator 1)
DDTeamCollection::addTeamsBestOf(int, int, int) at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbserver/DDTeamCollection.actor.cpp:4479
DDTeamCollectionImpl::BuildTeamsActorState<DDTeamCollectionImpl::BuildTeamsActor>::a_body1cont2(int) at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbserver/DDTeamCollection.actor.cpp:599
DDTeamCollectionImpl::BuildTeamsActorState<DDTeamCollectionImpl::BuildTeamsActor>::a_body1cont1break1(int) at /home/foundationdb_ci/foundationdb_build_output/dbdbdbdbdbdbdbdbdbdbdbdbdbdbdbdb/fdbserver/DDTeamCollection.actor.g.cpp:3618
DDTeamCollectionImpl::BuildTeamsActorState<DDTeamCollectionImpl::BuildTeamsActor>::a_body1cont1loopBody1(int) at /home/foundationdb_ci/foundationdb_build_output/dbdbdbdbdbdbdbdbdbdbdbdbdbdbdbdb/fdbserver/DDTeamCollection.actor.g.cpp:3605 (discriminator 5)
DDTeamCollectionImpl::BuildTeamsActorState<DDTeamCollectionImpl::BuildTeamsActor>::a_body1cont1loopHead1(int) at /home/foundationdb_ci/foundationdb_build_output/dbdbdbdbdbdbdbdbdbdbdbdbdbdbdbdb/fdbserver/DDTeamCollection.actor.g.cpp:3574
(inlined by) DDTeamCollectionImpl::BuildTeamsActorState<DDTeamCollectionImpl::BuildTeamsActor>::a_body1cont1(Void const&, int) at /home/foundationdb_ci/foundationdb_build_output/dbdbdbdbdbdbdbdbdbdbdbdbdbdbdbdb/fdbserver/DDTeamCollection.actor.g.cpp:3365
(inlined by) DDTeamCollectionImpl::BuildTeamsActorState<DDTeamCollectionImpl::BuildTeamsActor>::a_body1when1(Void const&, int) at /home/foundationdb_ci/foundationdb_build_output/dbdbdbdbdbdbdbdbdbdbdbdbdbdbdbdb/fdbserver/DDTeamCollection.actor.g.cpp:3380
(inlined by) DDTeamCollectionImpl::BuildTeamsActorState<DDTeamCollectionImpl::BuildTeamsActor>::a_callback_fire(ActorCallback<DDTeamCollectionImpl::BuildTeamsActor, 0, Void>*, Void const&) at /home/foundationdb_ci/foundationdb_build_output/dbdbdbdbdbdbdbdbdbdbdbdbdbdbdbdb/fdbserver/DDTeamCollection.actor.g.cpp:3401
(inlined by) ActorCallback<DDTeamCollectionImpl::BuildTeamsActor, 0, Void>::fire(Void const&) at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/include/flow/flow.h:1313
void SAV<Void>::send<Void>(Void&&) at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/include/flow/flow.h:654
Promise<Void>::~Promise() at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/include/flow/flow.h:922
(inlined by) N2::Net2::PromiseTask::~PromiseTask() at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/Net2.actor.cpp:253
(inlined by) N2::Net2::PromiseTask::operator()() at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/Net2.actor.cpp:260
(inlined by) N2::Net2::run() at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/Net2.actor.cpp:1492
main at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbserver/fdbserver.actor.cpp:2310 (discriminator 4)
?? ??:0
_start at ??:?
What could be the reason?
Looks like you have 18 undesired storage servers. Were they all excluded? If so, you may want to either include them back, or adding more storage servers.
DD found no healthy team in the system. And if you make changes suggested above, DD can build healthy teams.
Looks like you have 18 undesired storage servers. Were they all excluded?
No, they are not excluded.
If I execute fdbtop process issue
I get a reply that all processes are healthy.
It's likely your configuration has some problems, e.g., locality setting, DC ID. It's hard to tell without looking at them.
The DD crash issue can be repeated: https://forums.foundationdb.org/t/dd-crashed-when-storage-servers-exceed-1200/4144/3