kafka-config-provider-aws icon indicating copy to clipboard operation
kafka-config-provider-aws copied to clipboard

Connector restarts when secrets have not changed

Open yarinb opened this issue 3 years ago • 7 comments

Hi! We're seeing that our connector restarts constantly every 5 minutes (default TTL), even when the secrets have not changed at all:

[Worker-09cf386d7fd1b475f] [2022-07-04 19:09:23,762] INFO get() - path = 'CDC-connectors' keys = '[rds-debezium-password, rds-debezium-user]' (com.github.jcustenborder.kafka.config.aws.SecretsManagerConfigProvider:78)
[Worker-09cf386d7fd1b475f] [2022-07-04 19:09:23,783] INFO Scheduling a restart of connector cdc-debezium-connector in 300000 ms (org.apache.kafka.connect.runtime.WorkerConfigTransformer:92)

Is this behavior expected? as we're not expecting to change the username/password for debezium user - can this TTL configuration be set to a value that won't keep on restarting?

yarinb avatar Jul 04 '22 19:07 yarinb

@yarinb Did you manage to find a workaround? We're seeing the exact same behavior in AWS MSK.

GergelyKalmar avatar Jan 04 '23 12:01 GergelyKalmar

I think that this is configurable via the MSK Connect worker configuration:

key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.storage.StringConverter
config.providers=secretManager
config.providers.secretManager.class=com.github.jcustenborder.kafka.config.aws.SecretsManagerConfigProvider
config.providers.secretManager.param.aws.region=us-east-1
config.providers.secretManager.param.secret.ttl.ms=86400000

Note the added config.providers.secretManager.param.secret.ttl.ms value (changed to 1 day in this case) – documented as secret.ttl.ms in the config provider documentation.

Looks like this configuration works as expected:

[Worker-04aa30dd28e9688e0] [2023-01-04 15:13:38,464] INFO Scheduling a restart of connector debezium-app-core-cdc in 86400000 ms (org.apache.kafka.connect.runtime.WorkerConfigTransformer:92)

Having said that, the connector should definitely not restart when the secrets did not change.

GergelyKalmar avatar Jan 04 '23 15:01 GergelyKalmar

I'm seeing this too. In my case the Debezium connector I was using was only about 2/3 the way through its initial snapshot and then the connector would go into a restart loop indefinitely and the topic itself would not progress. I'll try setting the secret TTL as a workaround but this definitely feels like a bug. Does setting it to 1 day mean that the connector will restart daily?

jslusher avatar Feb 10 '23 18:02 jslusher

I found this in AWS documentation on external providers and it might be one of the reasons the connector is continually restarting:

By default, MSK Connect frequently restarts a connector when the connector uses a configuration provider. To turn off this restart behavior, you can set the config.action.reload value to none in your connector configuration.

I'm guessing that setting config.action.reload to none would mean that in the event a password actually does get changed, manual intervention would have to take place for the change to be present on the connector. Since there's no good way to restart a MSK connector at this time without recreating it, that means the connector would need to be recreated any time a password is changed. I'm going to test this theory. In our case I think it's worth the tradeoff if it works as expected.

jslusher avatar Feb 10 '23 19:02 jslusher

After setting config.action.reload to none in the connector configuration (not the worker configuration) the connector stopped continuously restarting. The downside, for a Debezium MySQL connector at least, is that in the event a password gets changed, the connector will remain connected to the server and it will even continue to read binlogs, but ultimately once it gets unconnected, the connector will need to deleted/recreated for it to pick up the new password. If offset.storage.topic is set to something that isn't random, which it is by default, the connector, when recreated, will just pick up where it left off.

jslusher avatar Feb 13 '23 17:02 jslusher

That's actually a good catch, the manual restart might be a better option. I guess it would still be best if the connector would be able to just pick up changed secrets on its own though.

GergelyKalmar avatar Feb 16 '23 09:02 GergelyKalmar

@yarinb Did you manage to find a workaround? We're seeing the exact same behavior in AWS MSK.

@GergelyKalmar I am investigating the opposite scenario (how to restart the MSK connector on configuration change) and from reading the docs & kafka code I would expect to see what exactly you're experiencing in MSK connect: A scheduled restart based on the secret.ttl.ms property.

However, I _don't _ see that and I don't understand why. The config.action.reload = restart is set. Could you share your configuration/properties etc?

spirosag avatar Feb 22 '23 17:02 spirosag