longhorn-manager icon indicating copy to clipboard operation
longhorn-manager copied to clipboard

fix(setting): check if the settings are synchronized first.

Open mantissahz opened this issue 1 year ago • 3 comments

Which issue(s) this PR fixes:

Issue longhorn/longhorn#7797

What this PR does / why we need it:

Check if the settings are synchronized first instead of checking if volumes are attached first to eliminate unnecessary error logs.

Special notes for your reviewer:

Additional documentation or context

mantissahz avatar Jan 29 '24 14:01 mantissahz

@PhanLe1010 @derekbit Please help review this.

innobead avatar Jan 29 '24 14:01 innobead

Right now, it will still log even if a particular data engine is disabled even though we already cleanup the corresponding IM and there is no changes in the system.

@PhanLe1010,

Could you describe the Longhorn version, the steps to reproduce it, and show the logs you found? Is the log like this one?

E0127 15:17:07.376434       1 setting_controller.go:208] failed to sync setting for longhorn-system/guaranteed-instance-manager-cpu: current state prevents this: failed to apply guaranteed-instance-manager-cpu setting to Longhorn components when there are attached volumes. It will be eventually applied

Because we depend on the function AreAllVolumesDetached(here) to check if all volumes are detached, it should not return an error (and log) if there is no instance manager for a particular data engine.

mantissahz avatar Jan 31 '24 06:01 mantissahz

Right now, it will still log even if a particular data engine is disabled even though we already cleanup the corresponding IM and there is no changes in the system.

@PhanLe1010,

Could you describe the Longhorn version, the steps to reproduce it, and show the logs you found? Is the log like this one?

E0127 15:17:07.376434       1 setting_controller.go:208] failed to sync setting for longhorn-system/guaranteed-instance-manager-cpu: current state prevents this: failed to apply guaranteed-instance-manager-cpu setting to Longhorn components when there are attached volumes. It will be eventually applied

Because we depend on the function AreAllVolumesDetached(here) to check if all volumes are detached, it should not return an error (and log) if there is no instance manager for a particular data engine.

Sorry for the late response. So with this PR, we have:

  1. When we sync the setting SettingNameV2DataEngine, we enter this line of code https://github.com/mantissahz/longhorn-manager/blob/a9bff9d1a3ddc496808cc13d7989bc6efed839cf/controller/setting_controller.go#L985-L986
  2. The above line of code check if the dataEngineEnabled is enable or not. Assume that the setting is never changed and it is always is false. In this case, we are still printing out the log https://github.com/mantissahz/longhorn-manager/blob/a9bff9d1a3ddc496808cc13d7989bc6efed839cf/datastore/longhorn.go#L419-L420 if there are attached volumes.
  3. The end result is that we will print out that log every 30s even though there is nothing change in the system/setting

WDYT about this concern?

PhanLe1010 avatar Mar 15 '24 01:03 PhanLe1010

@PhanLe1010,

The above line of code check if the dataEngineEnabled is enable or not. Assume that the setting is never changed and it is always is false.

  1. The setting is always false after a fresh install, and it would have any volumes attached. I think it will not print the logs or do you think there will be an error returned from https://github.com/mantissahz/longhorn-manager/blob/a9bff9d1a3ddc496808cc13d7989bc6efed839cf/datastore/longhorn.go#L419-L420?
  2. The setting is enabled and some volumes are attached and then disable the setting. It should be printing logs.

In this case, we are still printing out the log https://github.com/mantissahz/longhorn-manager/blob/a9bff9d1a3ddc496808cc13d7989bc6efed839cf/datastore/longhorn.go#L419-L420 if there are attached volumes.

Do you mean we are printing out the log https://github.com/mantissahz/longhorn-manager/blob/a9bff9d1a3ddc496808cc13d7989bc6efed839cf/datastore/longhorn.go#L423-L425 when volumes are attached with dataEngineEnabled false?

or Do you mean that when dataEngineEnabled is false and https://github.com/mantissahz/longhorn-manager/blob/a9bff9d1a3ddc496808cc13d7989bc6efed839cf/datastore/longhorn.go#L417 will return an error?

  1. I think we should have the log when volumes are attached with dataEngineEnabled false, right?
  2. The log level is Trace (ErrorIsInvalidState) like other danger zone settings.

mantissahz avatar Mar 26 '24 03:03 mantissahz

Do you mean we are printing out the log https://github.com/mantissahz/longhorn-manager/blob/a9bff9d1a3ddc496808cc13d7989bc6efed839cf/datastore/longhorn.go#L423-L425 when volumes are attached with dataEngineEnabled false?

Hi @mantissahz , this is what I meant. We are printing out the logs when there is nothing change in the Longhorn system. Specifically, we are printing out the log when:

  1. There are attached volumes and they have not changed. They are still attached
  2. Setting dataEngineEnabled is false and it has not changed. It is still false.

I think the purpose of this PR is that we should not print out the duplicated logs so we should not have the above behavior. WDYT?

PhanLe1010 avatar Apr 09 '24 00:04 PhanLe1010

After discussing with @mantissahz , we think that it is difficult to achieve the behavior at https://github.com/longhorn/longhorn-manager/pull/2539#issuecomment-2043939546. We can keep the current implementation for now

PhanLe1010 avatar Apr 10 '24 00:04 PhanLe1010

@Mergify backport v1.6.x

mantissahz avatar Apr 18 '24 02:04 mantissahz

backport v1.6.x

✅ Backports have been created

mergify[bot] avatar Apr 18 '24 02:04 mergify[bot]