cp-ansible icon indicating copy to clipboard operation
cp-ansible copied to clipboard

Separate trustore/keystore creation for multple service in the same host

Open DennisFederico opened this issue 2 years ago • 5 comments

Description

When running multiple services in the same host, in particular for multiple connect workers, there is concurrency and race conditions can happen when manipulating the certificate files and keystores

(attaching picture of the error) Ansible error creating stores

Using the following host section example for sharing the host between connect services

` all: vars:

...

ssl_enabled: true
ssl_custom_certs: true
ssl_ca_cert_filepath: ~/inventories/gcp-sandbox/ssl/generated/CAcert.pem
ssl_signed_cert_filepath: ~/inventories/gcp-sandbox/ssl/generated/server.pem
ssl_key_filepath: ~/inventories/gcp-sandbox/ssl/generated/server-key.pem
regenerate_keystore_and_truststore: true

...

kafka_connect: vars: hostname_aliasing_enabled: true children: connect-main: vars: kafka_connect_cluster_name: connect-main kafka_connect_group_id: connect-main #kafka_connect_service_name: connect-main kafka_connect_config_filename: connect-main-distributed.properties kafka_connect_log_dir: "{{ kafka_connect_default_log_dir }}/connect-main" hosts: dfederico-demo-connect-0: dfederico-demo-connect-1:

connect-spawn1:
  vars:
    kafka_connect_cluster_name: connect-spawn1
    kafka_connect_group_id: connect-spawn1
    kafka_connect_service_name: connect-spawn1
    kafka_connect_config_filename: connect-spawn1-distributed.properties
    kafka_connect_log_dir: "{{ kafka_connect_default_log_dir }}/connect-spawn1"
  hosts:
    dfederico-demo-connect-2.A:
      ansible_host: dfederico-demo-connect-2
      hostname: dfederico-demo-connect-2
    dfederico-demo-connect-3.A:
      ansible_host: dfederico-demo-connect-3
      hostname: dfederico-demo-connect-3

connect-spawn2:
  vars:
    kafka_connect_cluster_name: connect-spawn2 
    kafka_connect_group_id: connect-spawn2
    kafka_connect_service_name: connect-spawn2
    kafka_connect_config_filename: connect-spawn2-distributed.properties
    kafka_connect_rest_port: 8084
    kafka_connect_jmxexporter_port: 8078
    kafka_connect_log_dir: "{{ kafka_connect_default_log_dir }}/connect-spawn2"
  hosts:
    dfederico-demo-connect-2.B:
      ansible_host: dfederico-demo-connect-2
      hostname: dfederico-demo-connect-2
    dfederico-demo-connect-3.B:
      ansible_host: dfederico-demo-connect-3
      hostname: dfederico-demo-connect-3

`

With the proposed change the service_name will be used as the filename prefix for all related artifacts when manipulating custom certificates and stores in the ssl role.

As stated, this change is important to properly install multiple services in the same host, although it's targeted to kafka_connect, as the only service with such example 'sample_invetories/multi_connect_workers_on_single_node.yml'

Type of change

  • [X] Bug fix (non-breaking change which fixes an issue)
  • [ ] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [ ] This change requires a documentation update

How Has This Been Tested?

Run in a cluster with different custom certificates, both encrypted and non-encrypted and checked the results in the '/var/ssl/private' folder as well as the generated properties for the services (both standard and host sharing) (see image) generated stores per service

Also checked that control center can see the clusters properly

Checklist:

  • [X] My code follows the style guidelines of this project
  • [X] I have performed a self-review of my own code
  • [X] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [X] My changes generate no new warnings
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • [ ] Any dependent changes have been merged and published in downstream modules
  • [ ] Any variable changes have been validated to be backwards compatible

DennisFederico avatar Jun 08 '22 14:06 DennisFederico

Thanks for the PR @DennisFederico Could you please also add/update any existing molecule scenario to cover these changes. Thanks !!

nsharma-git avatar Jun 14 '22 10:06 nsharma-git

Thanks for the PR @DennisFederico Could you please also add/update any existing molecule scenario to cover these changes. Thanks !!

I would do it gladly @nsharma-git, but I get errors running molecule, I'm sure it is something in my environment. Know of anyone I can ask for assistance with setting up molecule?

DennisFederico avatar Jun 15 '22 08:06 DennisFederico

Thanks for the PR @DennisFederico Could you please also add/update any existing molecule scenario to cover these changes. Thanks !!

I would do it gladly @nsharma-git, but I get errors running molecule, I'm sure it is something in my environment. Know of anyone I can ask for assistance with setting up molecule?

We would be happy to help you. You can reach out to #ansible or #ansible-oncall slack channels or you can ping any of our team members. @nsharma-git @anuj-apdev @utkarsh5474

nsharma-git avatar Jun 15 '22 09:06 nsharma-git

Thanks for the PR @DennisFederico Could you please also add/update any existing molecule scenario to cover these changes. Thanks !!

I would do it gladly @nsharma-git, but I get errors running molecule, I'm sure it is something in my environment. Know of anyone I can ask for assistance with setting up molecule?

We would be happy to help you. You can reach out to #ansible or #ansible-oncall slack channels or you can ping any of our team members. @nsharma-git @anuj-apdev @utkarsh5474

Thank you for your support over slack... but I'm still having problems running containers with Systemd with Docker Desktop for MacOs ... even with privilege mode, tmpfs, tweaking the /sys/fs/cgroup volume, etc... I still get errors about dBus, like

failure 1 during daemon-reload: Failed to connect to bus: No such file or directory\n

I'll keep tinkering and try to run molecule tests for the future because I'm not sure when I would have this docker issue sorted out or if it's in my reach to fix it.

I found some blog links and documentation that perhaps will help me, but again, I'm constrained by time and other tasks :(

DennisFederico avatar Jun 16 '22 09:06 DennisFederico

failure 1 during daemon-reload: Failed to connect to bus: No such file or directory\n

I had the same problem on my Mac just now, and got past it by downgrading Docker Desktop for Mac from 4.9.0 to 4.2.0. It's this issue on Docker: https://github.com/docker/for-mac/issues/6073

4.2.0 may be found here: https://docs.docker.com/desktop/release-notes/#docker-desktop-420

sverrehu avatar Jun 16 '22 12:06 sverrehu