tags vanished when created snapshot in parallel
tags of snapshot vanished when created snapshots in parallel.
environment: CentOS 7.2 Sheepdog v1.0.1
- serial (no problem)
[root@cent01 ~]# dog vdi list Name Id Size Used Shared Creation time VDI id Copies Tag Block Size Shift test_vdi01 0 1.0 GB 0.0 MB 0.0 MB 2016-10-27 20:15 31b5e2 3 22
[root@cent01 ~]# dog vdi snapshot -s aaa test_vdi01 [root@cent01 ~]# dog vdi snapshot -s bbb test_vdi01 [root@cent01 ~]# dog vdi snapshot -s ccc test_vdi01 [root@cent01 ~]# dog vdi snapshot -s ddd test_vdi01 [root@cent01 ~]# dog vdi snapshot -s eee test_vdi01
[root@cent01 ~]# dog vdi list Name Id Size Used Shared Creation time VDI id Copies Tag Block Size Shift s test_vdi01 7 1.0 GB 0.0 MB 0.0 MB 2016-10-27 20:15 31b5e2 3 aaa 22 s test_vdi01 8 1.0 GB 0.0 MB 0.0 MB 2016-10-27 20:21 31b5e3 3 bbb 22 s test_vdi01 9 1.0 GB 0.0 MB 0.0 MB 2016-10-27 20:21 31b5e4 3 ccc 22 s test_vdi01 10 1.0 GB 0.0 MB 0.0 MB 2016-10-27 20:21 31b5e5 3 ddd 22 s test_vdi01 11 1.0 GB 0.0 MB 0.0 MB 2016-10-27 20:21 31b5e6 3 eee 22 test_vdi01 0 1.0 GB 0.0 MB 0.0 MB 2016-10-27 20:21 31b5e7 3 22
- parallel (problem)
[root@cent01 ~]# dog vdi list Name Id Size Used Shared Creation time VDI id Copies Tag Block Size Shift test_vdi01 0 1.0 GB 0.0 MB 0.0 MB 2016-10-27 20:13 31b5dd 3 22
[root@cent01 ~]# dog vdi snapshot -s aaa test_vdi01 & dog vdi snapshot -s bbb test_vdi01 & dog vdi snapshot -s ccc test_vdi01 & dog vdi snapshot -s ddd test_vdi01 & dog vdi snapshot -s eee test_vdi01 &
[root@cent01 ~]# dog vdi list Name Id Size Used Shared Creation time VDI id Copies Tag Block Size Shift s test_vdi01 2 1.0 GB 0.0 MB 0.0 MB 2016-10-27 20:13 31b5dd 3 aaa 22 s test_vdi01 3 1.0 GB 0.0 MB 0.0 MB 2016-10-27 20:15 31b5de 3 22 s test_vdi01 4 1.0 GB 0.0 MB 0.0 MB 2016-10-27 20:15 31b5df 3 22 s test_vdi01 5 1.0 GB 0.0 MB 0.0 MB 2016-10-27 20:15 31b5e0 3 22 s test_vdi01 6 1.0 GB 0.0 MB 0.0 MB 2016-10-27 20:15 31b5e1 3 22 test_vdi01 0 1.0 GB 0.0 MB 0.0 MB 2016-10-27 20:15 31b5e2 3 22
@fuku-ys: Thank you for your reporting.
I can reproduce this issue on 1.0_65_gb6d64b2 and Ubuntu 16.04.
I think this is because of lack of atomicity.
In brief, 'dog vdi snapshot -s tag' procedure consists of three steps below:
- Get working VDI's ID (now it's V) (dog/vdi.c#L778)
- Write tag to inode of V (dog/vdi.c#L806)
- Create snapshot, that is, makes current working VDI V into snapshot and create a new working VDI (dog/vdi.c#L815)
Here, the procedure is not atomically.
So in parallel execution, all 'dog vdi snapshot -s tag' can get the same VDI's ID. In this case, a tag written by some 'dog ...' is overwritten by the other at step 2. That is why tag seems vanished.
Verbose output of parallel 'dog vdi snapshot -v -s tag' execution below also tells me so. All the 'dog ...' got the same VDI's ID.
new VID of original VDI: 31b5dd, VDI ID of newly created snapshot: 31b5dc
new VID of original VDI: 31b5de, VDI ID of newly created snapshot: 31b5dc
new VID of original VDI: 31b5df, VDI ID of newly created snapshot: 31b5dc
new VID of original VDI: 31b5e0, VDI ID of newly created snapshot: 31b5dc
new VID of original VDI: 31b5e1, VDI ID of newly created snapshot: 31b5dc
I got an idea that sheep simply rejects SD_OP_NEW_VDI in case that it is to create snapshot and a given vid is also snapshot. However, it is useless because this case corresponds to rollback.
The sheep.log tells me that rebase_vdi (sheep/vdi.c#L1285), instead of snapshot_vdi (sheep/vdi.c#L1215), was called four times. I think this is because the vid that every 'dog...' got (that is, 31b5dc) became snapshot after first snapshot taken.
$ grep -E '(rebase|snapshot)_vdi' /tmp/0/sheep.log
Nov 01 15:52:54 DEBUG [block] snapshot_vdi(1224) test_vdi01: size 1073741824, vid 31b5dd, base 31b5dc, copies 1, block_size_shift 22, snapid 2
Nov 01 15:52:55 DEBUG [block] rebase_vdi(1295) test_vdi01: size 1073741824, vid 31b5de, base 31b5dc, cur 31b5dd, copies 1, block_size_shift 22, snapid 3
Nov 01 15:52:55 DEBUG [block] rebase_vdi(1295) test_vdi01: size 1073741824, vid 31b5df, base 31b5dc, cur 31b5de, copies 1, block_size_shift 22, snapid 4
Nov 01 15:52:55 DEBUG [block] rebase_vdi(1295) test_vdi01: size 1073741824, vid 31b5e0, base 31b5dc, cur 31b5df, copies 1, block_size_shift 22, snapid 5
Nov 01 15:52:55 DEBUG [block] rebase_vdi(1295) test_vdi01: size 1073741824, vid 31b5e1, base 31b5dc, cur 31b5e0, copies 1, block_size_shift 22, snapid 6
Problems:
- atomicity
- telling intent of SD_OP_NEW_VDI to sheep
TODO:
Protocol:
- For [1], request body of SD_OP_NEW_VDI should contain of both name and tag
- For [2],
sd_req.vdi.snapidshould be tri-value, that is, BRANDNEW (0), SNAPSHOT (1) and ROLLBACK (2)
dog:
- For [1],
dog vdi snapshotshould rather send SD_OP_NEW_VDI including tag in its body than send SD_OP_WRITE_OBJ to write snapshot tag - For [2],
dog vdi snapshotshouldset sd_req.vdi.snapidto SNAPSHOT - For [2],
dog vdi rollbackshouldset sd_req.vdi.snapidto ROLLBACK
sheep:
- should reject SD_OP_NEW_VDI in any case of the following:
-
sd_req.vdi.snapidis SNAPSHOT andbase_vid != current_vid -
sd_req.vdi.snapidis ROLLBACK andbase_vid == current_vid
-
- should write given tag
Risk:
- data lost when snapshot, write then uninteded rollback
- no data lost if no write