datahub icon indicating copy to clipboard operation
datahub copied to clipboard

fix(platform): Add aws-secretsmanager-jdbc driver in dependencies

Open atul-chegg opened this issue 2 years ago • 5 comments

Checklist

  • [x] The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • [x] Links to related issues (if applicable)
  • [x] Tests for the changes have been added/updated (if applicable)
  • [x] Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • [x] For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

atul-chegg avatar Sep 16 '22 14:09 atul-chegg

Issue link https://github.com/datahub-project/datahub/issues/5872

atul-chegg avatar Sep 17 '22 05:09 atul-chegg

Unit Test Results (build & test)

562 tests   562 :heavy_check_mark:  13m 55s :stopwatch: 139 suites      0 :zzz: 139 files        0 :x:

Results for commit aefeed35.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Sep 19 '22 02:09 github-actions[bot]

Does adding the jar on the classpath automatically ensure that secrets are resolved from the secret store?

shirshanka avatar Sep 19 '22 17:09 shirshanka

@shirshanka

Does adding the jar on the classpath automatically ensure that secrets are resolved from the secret store?

Yes. Please see documentation at https://github.com/aws/aws-secretsmanager-jdbc We just need to change EBEAN_DATASOURCE_DRIVER to com.amazonaws.secretsmanager.sql.AWSSecretsManagerMySQLDriver for mysql and EBEAN_DATASOURCE_USERNAME to secret ID in env variables. ENV variable EBEAN_DATASOURCE_PASSWORD can be set to empty as it is not used. And scheme for EBEAN_DATASOURCE_URL should be "jdbc-secretsmanager:mysql:" And yes, container should have access on AWS Secret manager. How to grant that access is out of scope of this MR. In EKS, this access is usually granted using IRSA.

atul-chegg avatar Sep 19 '22 17:09 atul-chegg

I tested the following way.

  1. Checkout
git clone [email protected]:atul-chegg/datahub.git
git checkout add-aws-mysql-jdbc-driver
git merge master
  1. Build image. I use mac.
atul.atri@C02FD3A3MD6M datahub % docker buildx ls
NAME/NODE       DRIVER/ENDPOINT STATUS  BUILDKIT PLATFORMS
default *       docker                           
  default       default         running 20.10.17 linux/amd64, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6
desktop-linux   docker                           
  desktop-linux desktop-linux   running 20.10.17 linux/amd64, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6
atul.atri@C02FD3A3MD6M datahub % docker buildx create --use desktop-linux
frosty_williams
atul.atri@C02FD3A3MD6M datahub % docker buildx ls   
NAME/NODE          DRIVER/ENDPOINT  STATUS  BUILDKIT PLATFORMS
frosty_williams *  docker-container                  
  frosty_williams0 desktop-linux    running v0.10.4  linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6
default            docker                            
  default          default          running 20.10.17 linux/amd64, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6
desktop-linux      docker                            
  desktop-linux    desktop-linux    running 20.10.17 linux/amd64, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6
atul.atri@C02FD3A3MD6M datahub % docker buildx build -f ./docker/datahub-gms/Dockerfile --tag atulchegg/datahub-gms:6dc5e46eb --platform='linux/amd64,linux/arm64' .
<..output skipped..>
atul.atri@C02FD3A3MD6M datahub % docker images
REPOSITORY                             TAG               IMAGE ID       CREATED          SIZE
atulchegg/datahub-gms                  6dc5e46eb         b214d3d8105b   20 minutes ago   442MB
  1. Push image
atul.atri@C02FD3A3MD6M datahub % docker login
<..output skipped..>
atul.atri@C02FD3A3MD6M datahub % docker push atulchegg/datahub-gms:6dc5e46eb 
<..output skipped..>
  1. Update helm values for GMS service. Notice the following a. Image is changed to custom image containing aws-secretsmanager-jdbc b. Added EBEAN_DATASOURCE_* Env variables. This is not done in global values so other services are not affected. c. Env variable EBEAN_DATASOURCE_USERNAME points to secret ID in secret manager. The value for this secret ID should be JSON containing usernmae and password. {"username": "db-user-name", "password": "db-user-password"} d. Scheme in env variable EBEAN_DATASOURCE_URL changed to jdbc-secretsmanager:mysql: e. Env variable EBEAN_DATASOURCE_DRIVER is changed to com.amazonaws.secretsmanager.sql.AWSSecretsManagerMySQLDriver
datahub-gms:
  image:
    repository: atulchegg/datahub-gms
    tag: 6dc5e46eb
  extraVolumes:
      - name: secrets-store-inline
        csi:
          driver: secrets-store.csi.k8s.io
          readOnly: true
          volumeAttributes:
            secretProviderClass: datahub-secrets
  extraVolumeMounts:
    - name: secrets-store-inline
      mountPath: "/mnt/secrets-store"
      readOnly: true
  serviceAccount:
    annotations:
      "eks.amazonaws.com/role-arn": "${datahub_role_arn}"
  service:
    type: LoadBalancer
    port: 443
    targetPort: http
    protocol: TCP
    name: https
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-internal: "true"
      service.beta.kubernetes.io/aws-load-balancer-scheme: internal
      service.beta.kubernetes.io/aws-load-balancer-ssl-cert: ${acm_certificate_id}
      service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "https"
      service.beta.kubernetes.io/aws-load-balancer-security-groups: ${datahub_frontend_sg_id}
      service.beta.kubernetes.io/aws-load-balancer-subnets: ${datahub_frontend_lb_subnets}
  podAnnotations:
    prometheus.io/scrape: 'true'
    prometheus.io/path: '/metrics'
    prometheus.io/port: '4318'
  extraEnvs:
    - name: EBEAN_DATASOURCE_USERNAME
      value: "test/datahub/mysql/datahub-db/datahub"
    - name: EBEAN_DATASOURCE_PASSWORD
      value: "dummy-pass-not-used"
    - name: EBEAN_DATASOURCE_HOST
      value: "${mysql_server}:3306"
    - name: EBEAN_DATASOURCE_URL
      value: "jdbc-secretsmanager:mysql://${mysql_server}:3306/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8&enabledTLSProtocols=TLSv1.2"
    - name: EBEAN_DATASOURCE_DRIVER
      value: "com.amazonaws.secretsmanager.sql.AWSSecretsManagerMySQLDriver"

It works very well. It picks DB username and password from secret manager those are automatically updated when credentials in secret manager are rotated.

@shirshanka Please let me know if this MR can be reviewed and merged. Please let me know if any more work is needed from my side on this.

atul-chegg avatar Sep 21 '22 12:09 atul-chegg

Running CI. I think the PR looks okay. Will go ahead and merge once CI passes.

jjoyce0510 avatar Sep 21 '22 21:09 jjoyce0510

@jjoyce0510 Please let me know if it can be merged now

atul-chegg avatar Sep 22 '22 18:09 atul-chegg

@shirshanka

Thank you for accepting my MR.

About external jars,

I really do not know how you are going to implement this. But I did some testing.

I started datahub-gms container with sleep 7200 command so that I can login to the container. After SSHing to the container,

  1. I downloaded aws-secretsmanager-jdbc and all it's dependent libraries.
cd /home/datahub
curl -L -O http://search.maven.org/remotecontent?filepath=org/apache/ivy/ivy/2.5.0/ivy-2.5.0.jar
java -jar ivy-2.5.0.jar -dependency com.amazonaws.secretsmanager aws-secretsmanager-jdbc 1.0.8 -retrieve "/home/datahub/dependencies/lib/[artifact]-[revision](-[classifier]).[ext]"
  1. Then I tried to start datahub gms service with the following command. I added --lib option
java -javaagent:jmx_prometheus_javaagent.jar=4318:/datahub/datahub-gms/scripts/prometheus-config.yaml -jar /jetty-runner.jar --jar jetty-util.jar --jar jetty-jmx.jar --lib /home/datahub/dependencies/lib --config /datahub/datahub-gms/scripts/jetty.xml /datahub/datahub-gms/bin/war.war

But it did not work because of two reasons

  1. aws-secretsmanager-jdbc documentation mentions that real driver ("com.mysql.cj.jdbc.Driver") must be pre-registered (issue: 44). "com.amazonaws.secretsmanager.sql.AWSSecretsManagerMySQLDriver" is just a wrapper that under the hood uses "com.mysql.cj.jdbc.Driver".
  2. Jetty started reporting multiple versions. Gradle, at the time of packaging, resolves dependencies and download correct version of dependent library. But in my testing, I downloaded these dependencies externally, so it caused some multiple version issue.

I hope this information helps datahub team when they implement external jars functionality.

atul-chegg avatar Sep 23 '22 06:09 atul-chegg