cp-ansible icon indicating copy to clipboard operation
cp-ansible copied to clipboard

Kafka Connect for z/OS support

Open stan-is-hate opened this issue 1 year ago • 2 comments

Description

This PR adds support for running kafka connect on z/OS. This is in BETA, as in not everything has been tested and some things don't work. I'm not really familiar with your branching strategy and where to point this PR, would appreciate any pointers.

I've tested it using IBM zD&T and cp-env. A sister PR for cp-env is here: https://github.com/confluentinc/cp-test-env-manager/pull/136

Implementation details:

  • z/OS requires a set of environment variables to be set. cp-ansible provided a variable called proxy_env. Even though it was supposed to be only used for specifying proxy details, in fact all playbooks set it as environment for all tasks automatically. I've renamed it env_vars to be more explicit about what the variable does. This is the only breaking change, all the rest of the changes are hidden behind the ansible_os_family check.
  • z/OS doesn't have SystemD or package managers. It also doesn't have gunzip. So what we do is download confluent tgz locally on your laptop, gunzip it there (but not untar) and then pass the tar to z/os, where we untar it.
  • z/OS doesn't support confluent hub, so we can only install local and remote connectors.
  • This is aimed primarily at testing IBM MQ connectors, and these connectors require IBM jars and .so files to be present in their classpath. So I've added a special hook when installing connectors which will recognize ibm mq connectors and copy jars and .so files onto their classpath.
  • Because we're running in virtualized z/OS, I've added a health_check_hostname var, to make sure guest OS hits its guest IP rather than host IP when doing health checks. I'll do more tests without it as well to make sure the issues I was having were not a fluke, but in any case this change is non breaking.

Limitations and unknowns:

  • SSL doesn't work. Even though I have confirmed that the playbook creates keystores with correct certificates, connect still refuses to work correctly. It's better if connect team handles this. We can document it as not supported for now.
  • Confluent CLI doesn't work on z/OS (that's a product limitation), so any functionality involving it won't work either, e.g. secrets protection.
  • I have not tested deploy_connectors.yml because it is not exposed in cp-env yet, but looking at the code it's all done via REST, so should work fine
  • I have tested RBAC with kerberos, and it does work. However, when rbac is enabled, connect does not use kerberos at all, so there's no guarantee kerberos would work. Because we're running in virtualized instances, there are multiple hostnames involved, one of the host OS, another of the guest OS Kerberos relies on hostnames a lot, so there might be issues with that. I'll test it some more.

Type of change

  • [ ] Bug fix (non-breaking change which fixes an issue)

  • [x ] New feature (non-breaking change which adds functionality)

  • [x] Breaking change (fix or feature that would cause existing functionality to not work as expected)

  • [x] This change requires a documentation update

  • [ ] Any variable changes have been validated to be backwards compatible

Checklist:

  • [ ] My code follows the style guidelines of this project
  • [ ] I have performed a self-review of my own code
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] My changes generate no new warnings
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • [ ] Any dependent changes have been merged and published in downstream modules

stan-is-hate avatar Feb 02 '23 23:02 stan-is-hate