shine
shine copied to clipboard
Remote target format actions are sequential
Version Shine : 0.907 Version Clush : 1.3-1
All format commands are not starting at same time, it is group by node.
[root@berlin0 ~]# shine format -f 30osts
Format 30osts on berlin[0,4-7]: are you sure? (y)es/(N)o: y
Starting format of 32 targets on berlin[0,4-7]
[10:11] In progress for 2 target(s) on berlin0 ...
[10:11] In progress for 9 target(s) on berlin[0,7] ...
[10:31] In progress for 8 target(s) on berlin[0,7] ...
[10:32] In progress for 5 target(s) on berlin[0,7] ...
[10:33] In progress for 3 target(s) on berlin[0,7] ...
[10:33] In progress for 8 target(s) on berlin[0,4] ...
[10:56] In progress for 4 target(s) on berlin[0,4] ...
[10:59] In progress for 8 target(s) on berlin5 ...
[11:22] In progress for 4 target(s) on berlin5 ...
[11:23] In progress for 3 target(s) on berlin5 ...
[11:23] In progress for 7 target(s) on berlin6 ...
[11:44] In progress for 4 target(s) on berlin6 ...
[11:48] In progress for 3 target(s) on berlin6 ...
Format successful.
FILESYSTEM COMPONENTS STATUS (30osts)
+-----+---+------------+--------+
|type |# | nodes | status |
+-----+---+------------+--------+
|MGT | 1 |berlin0 |offline |
|MDT | 1 |berlin0 |offline |
|OST |30 |berlin[4-7] |offline |
+-----+---+------------+--------+
Reported by: ohargoaa
description
modified (diff)
There is bug #77 which is tracking similar issue but do not affect 'format' command. So I'm surprised by this.
Are you using loopback devices? What's the output if you run the same command with -v and -d ?
Original comment by: degremont
attachment
set to shine_format.log
Format log with -d -v options
Original comment by: ohargoaa
attachment
set to shine_start.log
Log with shine start -d -v
Original comment by: ohargoaa
Problem is teh same for start and format actions. Loop devices are used only for MDT and MGT.
Original comment by: ohargoaa
owner
changed from st-cea to ad-cea
priority
changed from major to minor
status
changed from new to accepted
milestone
set to 0.909
Ok, I've found why.
You've got a special configuration were some of your device are loopback devices. When Shine detects loopback devices, due to Lustre Bug BZ18624, it switches to a sequential mode for its actions (this is true for start, stop, format). As connection to a distant mode is handled the same way that a mount or a mkfs, the connections to other nodes are also sequential.
Loopback devices are not supported as a production mode. So this issue will not be considered as very important. Anyway this surely could be improved but the fix is not trivial.
Original comment by: degremont
milestone
changed from 0.909 to 0.910
Original comment by: degremont
milestone
changed from 0.910 to 0.911
Original comment by: degremont
- Description has changed:
Diff:
- Milestone: 1.3 --> 1.4
- Resolution: -->
Original comment by: degremont
- Milestone: 1.4 --> 1.5
Original comment by: degremont
- Milestone: 1.5 --> 1.6
Original comment by: degremont
FWIW this BZ18624 has been fixed in 1.8.2 (I think? listed in release notes https://fossies.org/linux/lustre/lustre/ChangeLog ); I think we can just no longer activate sequential mode there...
(Saw this by chance, it has been annoying me when just the mgt was on a loop device for "large" tests; having folks do clush -w @servers shine -L format -y isn't really comfortable...)