moosefs 3.0.105: temporary chunkserver maintenance is not working as expected

Replication starts immediately as soon as one chunkserver is stopped without waiting for CS_TEMP_MAINTENANCE_MODE_TIMEOUT.

Ideally there should be a way to delay replication of "undergoal" or "endangered" chunks as result of temporary chunkserver shutdown.

Also it might be useful to stop some chunkservers overnight and in such case it is necessary to delay replication too.

Apr 25 '19 22:04 onlyjob

Hi Dmitry,

we don't know any issues related to Maintenance Mode. It should work properly. Automatic Temporary Maintenance Mode is (automatically) enabled for (by default) 30 minutes when Chunkserver is stopped gracefully. Moreover in order replications not to be performed, all the disconnected Chunkservers must be in Maintenance Mode (so if for example there are two disconnected Chunkservers: one with MM enabled and second with MM disabled, all replications are performed).

There is also a possibility to enable/disable Maintenance Mode manually from MooseFS CGI / CLI ("Servers" tab). Maintenance Mode enabled manually must be disabled also by hand – in this case CS_TEMP_MAINTENANCE_MODE_TIMEOUT does not apply.

Best regards, Piotr / MooseFS Team

Apr 26 '19 09:04 pkonopelko

I observe the problem with taking down just one chunksevrer gracefully, by systemctl stop moosefs-chunkserver. Systemd's TimeoutStopSec is not defined for the service and system default (as per DefaultTimeoutStopSec) is 90 seconds. Chunkserver is stopped gracefully yet every time it stopped replication starts despite maintenance:ON (temp) status in CGI...

Apr 26 '19 23:04 onlyjob

How does the master know if chunkserver has been stopped gracefully? Assuming that's the problem, why it does not affect chunkserver's maintenance status in CGI?

Apr 26 '19 23:04 onlyjob

Almost three months passed... Could we get some attention here please?

This issue is very annoying as it cause undesirable replication (and excessive I/O) every time when a chunkserver goes into temporary maintenance mode (like when computer is suspended). CGI shows "maintenance: ON (temp)" yet it has no effect...

Jul 22 '19 23:07 onlyjob

HI. The problem is that we are unable to reproduce it. Whenever we stop chunkserver gracefully we see in CGI "maintenance: ON (temp)" and we do not see any replications.

How to replicate it? Could you send us some screenshots? More info about number of chunkservers etc?

Jul 24 '19 06:07 acid-maker

Screenshots would be redundant to what I have reported already. Replication affects chunks in archive label with two replicas. Graceful stopping of a chunkserver triggers immediate replication... Originally there were two chunksevers but adding a third one (not fully backfilled yet) did not improve situation... Not sure what else I could contribute...

IMHO graceful shut down is irrelevant. Temporary maintenance mode should be always consistent with CGI-reported status of a chunkserver.

Jul 24 '19 06:07 onlyjob

As for now it is enough. We will try to reproduce it using two or three chunkservers.

Jul 24 '19 07:07 acid-maker

We are still unable to reproduce it.

How does the master know if chunkserver has been stopped gracefully?

Graceful stopping is done by sending to chunkserver daemon a SIGTERM signal , which is then intercepted. One of the tasks performed during exiting is sending "DISCONNECT" packet to the master. This packet, when received by the master, switches this chunkserver in master's data structures to "temporary maintenance mode".

Assuming that's the problem, why it does not affect chunkserver's maintenance status in CGI?

No. This is not the case. There is only one place with info about maintenance mode - master data structures. Data shown in CGI are received from the master, from the same data structures that are used in replication algorithms to determine if replication should be performed or not.

IMHO graceful shut down is irrelevant. Temporary maintenance mode should be always consistent with CGI-reported status of a chunkserver.

Yes, it is. It has to be, because there is only one place where info about maintenance mode is stored.

There has to be another reason for such behaviour. Are you sure that you do not have other disconnected servers? As Piotr mentioned, when you have one server permanently disconnected (not in maintenance mode - just regular disconnected server) and then you disconnect gracefuly another server, replications will start, because the condition for blocking replications is "all disconnected servers are in maintenance mode". In this case one is in maintenance mode (temp) and the other one is not (even if this one has no chunks).

We need such condition, because we assume that the one disconnected server, which is not in maintenace mode, is broken and therefore we need to start replication. MFS doesn't store information about chunks that were on disconnected servers, so it can't distinguish between chunks that were on a permanently disconnected server (or one that just broke down) from chunks that were on a server in maintenance mode. Because of that all chunks are being replicated (it's always better to be on the safe side).

Jul 24 '19 12:07 acid-maker

Interesting, thanks.

As a matter of fact I do have one chunkserver permanently disconnected weeks ago. All chunks were moved away from that chunkserver (which is now inactive) and it have a different label than the one affected by undesirable replication during maintenance mode -- they are segregated by storage class definitions.

How can chunkserver be automatically removed (after some time) without using CGI's "click to remove"?

How to avoid replication when more than one chunkserver is in maintenance mode?

Perhaps master should remember chunkservers' labels to avoid that problem. Technically speaking decision to replicate should be based on number of available/active replicas and storage class definitions. Hinting from chunkserver availability seems unnecessary...

Jul 24 '19 14:07 onlyjob

It feels wrong that chunkserver that has been stopped in March -- over 4 months ago, is still affecting replication...

Jul 25 '19 02:07 onlyjob

How can chunkserver be automatically removed (after some time) without using CGI's "click to remove"?

We can add to the mfsmaster.cfg new option "DAYS_TO_REMOVE_UNUSED_CHUNKSERVER" (or something like that) with default value set to 30 or something like that.

How to avoid replication when more than one chunkserver is in maintenance mode?

Set maintenance mode for all of them - in your case set manually maintenance mode for all inactive chunkservers. If ALL disconnected chunkservers are in maintenance mode then the master will not start replications.

Perhaps master should remember chunkservers' labels to avoid that problem. Technically speaking decision to replicate should be based on number of available/active replicas and storage class definitions. Hinting from chunkserver availability seems unnecessary...

I have some ideas how to improve that. The problem is that first we designed all data structures and implemented all algorithms associated with chunks and their replications and then years later we added "maintenance mode" - this is why this is not the perfect solution.

It feels wrong that chunkserver that has been stopped in March -- over 4 months ago, is still affecting replication...

Yes. I agree. The problem is that I personally don't like when computer programs are "too intelligent" and this is why I usually leave such decisions to the user. Adding the "DAYS_TO_REMOVE_UNUSED_CHUNKSERVER" option should at least help with that.

Jul 25 '19 05:07 acid-maker

We can add to the mfsmaster.cfg new option "DAYS_TO_REMOVE_UNUSED_CHUNKSERVER" (or something like that) with default value set to 30 or something like that.

Sounds like a good idea. Let's do that please. :)

Thanks.

Jul 25 '19 07:07 onlyjob