cp-ansible icon indicating copy to clipboard operation
cp-ansible copied to clipboard

Kafka Connect Module Output

Open Fobhep opened this issue 3 years ago • 12 comments

Apart from the fact that this is a very great feature and makes life much easier, the output is unfortunately often not very helpful. You get a 400 return when deploying n connectors, no matter if the JDBC password in one connector is not correct or if the format of another connector is not suitable. At least this is my impression so far. I haven't had time to look into the code in more detail, but maybe someone can answer the question, if this is due to the Connector API, or if this could be improved in the module?

Fobhep avatar Sep 15 '20 09:09 Fobhep

@Fobhep Thanks for the question. Where do you get the 400 errror, do you mean after Connect restarts and the health check runs?

If so, the health check simply checks to see if we can query the list of Connectors from the Connect API. So if it ends in 400, this means that Connect has failed to start for some reason.

Can you confirm where you are receiving the 400 error?

Thanks

JumaX avatar Sep 15 '20 09:09 JumaX

The error happens when running the kafka-connector deployment task and Ansible returns either Request timed out or Bad Request

After digging in the logs I then managed to find Exceptions indicating that eg the password for one connector was wrong.

Fobhep avatar Sep 15 '20 10:09 Fobhep

@Fobhep Are you able to share which connectors you tried deploying and which one has the misconfiguration? We want to reproduce this in house.

We think it maybe an issue in the python library, whereby if a new connector fails it doesn't return the error code from the API, where as if an existing connector update fails it does.

JumaX avatar Sep 16 '20 15:09 JumaX

@JumaX In that particular customer scenarion it was JDBC connectors only

Fobhep avatar Sep 16 '20 15:09 Fobhep

Another thing I noticed only now:

Sometimes I get a

"HTTP Error: 409 Conflict", but the module itself is saying "changed: true" .

Now I am aware that the REST API may return 409 upon POSTing while a rebalance is in action. But shouldn't the module still fail if a POST job was not done? Or does 409 mean, the POST was done, but there was a Rebalane at the same time going on?

Fobhep avatar Oct 09 '20 11:10 Fobhep

Anything new here? This REST API for adding connectors seems to have its own mind. Just added a set of 6 jdbc oracle connectors to it (3source, 3sink). First time i got a 400 bad request, and nothing was configured... ok Retry with the exact same config. Now 1 of 6 is deployed, still got a 400 bad request....

michaelsstuff avatar Nov 10 '20 14:11 michaelsstuff

This was added as a contribution from the community, I've spoken with the author and he is making it a priority to review this, this week.

JumaX avatar Nov 10 '20 14:11 JumaX

@Fobhep @JumaX Resuming work on this issue now, sorry for the late reply. I'll rewrite the error management so that we get an explicit message/result for each connector.

I'll also see if there's a way to wait for a rebalance to finish. The 409 is indeed the response we get when there's a rebalance, which is why initially I did not treat it as an error, but it's true it masks an error if there's one, which is unfortunate.

ldom avatar Nov 16 '20 10:11 ldom

Quick update: I have completely rewritten the error management and I have added a status check on the tasks of connectors, which means that if a connector fails to initialize, it will be detected and returned as an error. Preparing a PR now.

ldom avatar Nov 20 '20 19:11 ldom

@ldom any status updates about this?

jamuska avatar Mar 01 '21 12:03 jamuska

@jamuska PR is there but has not been merged yet (https://github.com/confluentinc/cp-ansible/pull/490). I guess Justin is waiting for the molecule tests. I'll work on them this week.

ldom avatar Mar 01 '21 13:03 ldom

@ldom @jamuska Correct, we are waiting on the molecule tests. Let me know if I can be of assistance @ldom.

JumaX avatar Mar 02 '21 16:03 JumaX

Thank you @ldom for your help and contribution! :) we have now added molecule scenarios on top of this PR + fixed some known issues in kafka_connectors module to support ssl_enabled=true. Here's the merged PR https://github.com/confluentinc/cp-ansible/pull/1296, changes should be available in upcoming q1 patch release. Closing this one.

wadhwa1 avatar Jan 24 '23 10:01 wadhwa1