etcd
etcd copied to clipboard
new members added after v2->v3 data migration do not receive existing data if no .snap file exists
- Start an etcd server (versions tested: 3.1.9 - 3.2.9)
- Write v2 data, but less than 10,000 keys, so that no .snap file is created
- Stop the server
- Run a v2->v3 data store migration
- Restart the server
- Add new etcd members to the cluster
The new members do not receive the existing data when they join the cluster.
Observations:
- If only v3 data is written (no migration is performed), new members receive existing data without issue
- If the migrated store is restored via
snapshot restore
, that forces a .snap file to be created, and new members receive existing data - If the snapshot count is lowered to force a snapshot to be taken naturally prior to migration, new members receive existing data
recreation script at https://gist.github.com/liggitt/48d0c4460f30c18193770fd0b0816b3a
Reproducible
./bin/etcd
./bin/etcdctl set foo bar
# read key in etcd v2
./bin/etcdctl --output="json" get foo
# stop etcd node to migrate, one by one
# migrate v2 data
ETCDCTL_API=3 ./bin/etcdctl migrate --data-dir="default.etcd" --wal-dir="default.etcd/member/wal"
# restart etcd node
# confirm that the key got migrated
# ETCDCTL_API=3 ./bin/etcdctl put /foo bar
ETCDCTL_API=3 ./bin/etcdctl get /foo
# member add
ETCDCTL_API=3 ./bin/etcdctl \
member add s2 \
--peer-urls=http://localhost:22380
# start a new member
./bin/etcd --data-dir=data.etcd --name s2 \
--initial-advertise-peer-urls http://localhost:22380 \
--listen-peer-urls http://localhost:22380 \
--advertise-client-urls http://localhost:22379 \
--listen-client-urls http://localhost:22379 \
--initial-cluster default=http://localhost:2380,s2=http://localhost:22380 \
--initial-cluster-state existing \
# check key
ETCDCTL_API=3 ./bin/etcdctl --endpoints=http://localhost:2379 get /foo --consistency="s"
# /foo
# bar
ETCDCTL_API=3 ./bin/etcdctl --endpoints=http://localhost:22379 get /foo --consistency="s"
# empty
The newly joined member doesn't receive data when the v3 data was migrated from v2. If we just write v3 keys without migrate, new member receives data fine.
Will double-check after finishing up clientv3 + gRPC issues.
Thanks for report!
This seems more like a bug than a feature
Actually, took a look at our code base. This was never supported. Migrate command does not write Raft entries for transformed data, so the new member won't be able to receive data.
The metadata and hardState are ignored when migrating the WAL data, see migrate_command.go#L157
The V2 API has already been deprecated for years, and the migrate
command has already been removed starting from release-3.5, so it seems that it doesn't make much sense to support it any more. I suggest to list this as a known issue in the 3.4 doc, and add the workaround below in the doc.
The workaround is to perform a snapshot restore after the migration using command something like below,
$ cp $data/etcd1/member/snap/db $data/tmp
$ ETCDCTL_API=3 $etcdctl snapshot restore $data/tmp \
--name "etcd1" \
--data-dir=$data/etcd1 \
--initial-advertise-peer-urls "${peer_url}" \
--initial-cluster "${initial_cluster}" \
--skip-hash-check=true
Since the migrate
command has already been removed starting from release-3.5, so probably it makes sense to remove the doc https://etcd.io/docs/v3.5/op-guide/v2-migration/ from 3.5 as well? cc @spzala
@liggitt Do you have any comments or concerns if we update 3.4 doc (to list this as a known issue and add the workaround step) and close this issue?
v2 migration happened years ago, so I think it's safe to assume all migrated clusters have a .snap file at this point
Thanks for the feedback.
Please anyone feel free to update the 3.4 doc per my comment above https://github.com/etcd-io/etcd/issues/8804#issuecomment-1203173546, and remove v2-migration from 3.5
and main
.
Once a PR is submitted for the doc, then we can close this ticket.
@ahrtr @serathius - I can work on this issue.
Oh I see that https://github.com/etcd-io/website/pull/608 has been merged. Is there anything else required for this issue to be closed?
Resolved in https://github.com/etcd-io/website/pull/608