shine icon indicating copy to clipboard operation
shine copied to clipboard

Remote target format actions are sequential

Open degremont opened this issue 15 years ago • 11 comments

Version Shine : 0.907 Version Clush : 1.3-1

All format commands are not starting at same time, it is group by node.

[root@berlin0 ~]# shine format -f 30osts
Format 30osts on berlin[0,4-7]: are you sure? (y)es/(N)o: y
Starting format of 32 targets on berlin[0,4-7]
[10:11] In progress for 2 target(s) on berlin0 ...
[10:11] In progress for 9 target(s) on berlin[0,7] ...
[10:31] In progress for 8 target(s) on berlin[0,7] ...
[10:32] In progress for 5 target(s) on berlin[0,7] ...
[10:33] In progress for 3 target(s) on berlin[0,7] ...
[10:33] In progress for 8 target(s) on berlin[0,4] ...
[10:56] In progress for 4 target(s) on berlin[0,4] ...
[10:59] In progress for 8 target(s) on berlin5 ...
[11:22] In progress for 4 target(s) on berlin5 ...
[11:23] In progress for 3 target(s) on berlin5 ...
[11:23] In progress for 7 target(s) on berlin6 ...
[11:44] In progress for 4 target(s) on berlin6 ...
[11:48] In progress for 3 target(s) on berlin6 ...
Format successful.
FILESYSTEM COMPONENTS STATUS (30osts)
+-----+---+------------+--------+
|type |#  |   nodes    | status |
+-----+---+------------+--------+
|MGT  | 1 |berlin0     |offline |
|MDT  | 1 |berlin0     |offline |
|OST  |30 |berlin[4-7] |offline |
+-----+---+------------+--------+

Reported by: ohargoaa

degremont avatar Sep 10 '10 11:09 degremont

        description
          modified (diff)

There is bug #77 which is tracking similar issue but do not affect 'format' command. So I'm surprised by this.

Are you using loopback devices? What's the output if you run the same command with -v and -d ?

Original comment by: degremont

degremont avatar Sep 15 '10 11:09 degremont

        attachment
            set to shine_format.log

Format log with -d -v options

Original comment by: ohargoaa

degremont avatar Sep 22 '10 06:09 degremont

        attachment
            set to shine_start.log

Log with shine start -d -v

Original comment by: ohargoaa

degremont avatar Sep 22 '10 06:09 degremont

Problem is teh same for start and format actions. Loop devices are used only for MDT and MGT.

Original comment by: ohargoaa

degremont avatar Sep 22 '10 06:09 degremont

        owner
          changed from st-cea to ad-cea
        
        priority
            changed from major to minor
        
        status
            changed from new to accepted
        
        milestone
            set to 0.909

Ok, I've found why.

You've got a special configuration were some of your device are loopback devices. When Shine detects loopback devices, due to Lustre Bug BZ18624, it switches to a sequential mode for its actions (this is true for start, stop, format). As connection to a distant mode is handled the same way that a mount or a mkfs, the connections to other nodes are also sequential.

Loopback devices are not supported as a production mode. So this issue will not be considered as very important. Anyway this surely could be improved but the fix is not trivial.

Original comment by: degremont

degremont avatar Sep 22 '10 12:09 degremont

        milestone
            changed from 0.909 to 0.910

Original comment by: degremont

degremont avatar Sep 23 '10 19:09 degremont

        milestone
            changed from 0.910 to 0.911

Original comment by: degremont

degremont avatar Jan 20 '11 10:01 degremont

  • Description has changed:

Diff:




  • Milestone: 1.3 --> 1.4
  • Resolution: -->

Original comment by: degremont

degremont avatar Oct 10 '13 08:10 degremont

  • Milestone: 1.4 --> 1.5

Original comment by: degremont

degremont avatar Apr 30 '15 09:04 degremont

  • Milestone: 1.5 --> 1.6

Original comment by: degremont

degremont avatar May 24 '17 13:05 degremont

FWIW this BZ18624 has been fixed in 1.8.2 (I think? listed in release notes https://fossies.org/linux/lustre/lustre/ChangeLog ); I think we can just no longer activate sequential mode there...

(Saw this by chance, it has been annoying me when just the mgt was on a loop device for "large" tests; having folks do clush -w @servers shine -L format -y isn't really comfortable...)

martinetd avatar Nov 21 '19 06:11 martinetd