datahub
datahub copied to clipboard
fix(platform): Add aws-secretsmanager-jdbc driver in dependencies
Checklist
- [x] The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
- [x] Links to related issues (if applicable)
- [x] Tests for the changes have been added/updated (if applicable)
- [x] Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
- [x] For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub
Issue link https://github.com/datahub-project/datahub/issues/5872
Unit Test Results (build & test)
562 tests 562 :heavy_check_mark: 13m 55s :stopwatch: 139 suites 0 :zzz: 139 files 0 :x:
Results for commit aefeed35.
:recycle: This comment has been updated with latest results.
Does adding the jar on the classpath automatically ensure that secrets are resolved from the secret store?
@shirshanka
Does adding the jar on the classpath automatically ensure that secrets are resolved from the secret store?
Yes. Please see documentation at https://github.com/aws/aws-secretsmanager-jdbc
We just need to change EBEAN_DATASOURCE_DRIVER to com.amazonaws.secretsmanager.sql.AWSSecretsManagerMySQLDriver
for mysql and EBEAN_DATASOURCE_USERNAME to secret ID in env variables. ENV variable EBEAN_DATASOURCE_PASSWORD can be set to empty as it is not used. And scheme for EBEAN_DATASOURCE_URL should be "jdbc-secretsmanager:mysql:"
And yes, container should have access on AWS Secret manager. How to grant that access is out of scope of this MR. In EKS, this access is usually granted using IRSA.
I tested the following way.
- Checkout
git clone [email protected]:atul-chegg/datahub.git
git checkout add-aws-mysql-jdbc-driver
git merge master
- Build image. I use mac.
atul.atri@C02FD3A3MD6M datahub % docker buildx ls
NAME/NODE DRIVER/ENDPOINT STATUS BUILDKIT PLATFORMS
default * docker
default default running 20.10.17 linux/amd64, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6
desktop-linux docker
desktop-linux desktop-linux running 20.10.17 linux/amd64, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6
atul.atri@C02FD3A3MD6M datahub % docker buildx create --use desktop-linux
frosty_williams
atul.atri@C02FD3A3MD6M datahub % docker buildx ls
NAME/NODE DRIVER/ENDPOINT STATUS BUILDKIT PLATFORMS
frosty_williams * docker-container
frosty_williams0 desktop-linux running v0.10.4 linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6
default docker
default default running 20.10.17 linux/amd64, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6
desktop-linux docker
desktop-linux desktop-linux running 20.10.17 linux/amd64, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6
atul.atri@C02FD3A3MD6M datahub % docker buildx build -f ./docker/datahub-gms/Dockerfile --tag atulchegg/datahub-gms:6dc5e46eb --platform='linux/amd64,linux/arm64' .
<..output skipped..>
atul.atri@C02FD3A3MD6M datahub % docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
atulchegg/datahub-gms 6dc5e46eb b214d3d8105b 20 minutes ago 442MB
- Push image
atul.atri@C02FD3A3MD6M datahub % docker login
<..output skipped..>
atul.atri@C02FD3A3MD6M datahub % docker push atulchegg/datahub-gms:6dc5e46eb
<..output skipped..>
- Update helm values for GMS service. Notice the following
a. Image is changed to custom image containing
aws-secretsmanager-jdbc
b. AddedEBEAN_DATASOURCE_*
Env variables. This is not done in global values so other services are not affected. c. Env variableEBEAN_DATASOURCE_USERNAME
points to secret ID in secret manager. The value for this secret ID should be JSON containing usernmae and password.{"username": "db-user-name", "password": "db-user-password"}
d. Scheme in env variableEBEAN_DATASOURCE_URL
changed tojdbc-secretsmanager:mysql:
e. Env variableEBEAN_DATASOURCE_DRIVER
is changed tocom.amazonaws.secretsmanager.sql.AWSSecretsManagerMySQLDriver
datahub-gms:
image:
repository: atulchegg/datahub-gms
tag: 6dc5e46eb
extraVolumes:
- name: secrets-store-inline
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: datahub-secrets
extraVolumeMounts:
- name: secrets-store-inline
mountPath: "/mnt/secrets-store"
readOnly: true
serviceAccount:
annotations:
"eks.amazonaws.com/role-arn": "${datahub_role_arn}"
service:
type: LoadBalancer
port: 443
targetPort: http
protocol: TCP
name: https
annotations:
service.beta.kubernetes.io/aws-load-balancer-internal: "true"
service.beta.kubernetes.io/aws-load-balancer-scheme: internal
service.beta.kubernetes.io/aws-load-balancer-ssl-cert: ${acm_certificate_id}
service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "https"
service.beta.kubernetes.io/aws-load-balancer-security-groups: ${datahub_frontend_sg_id}
service.beta.kubernetes.io/aws-load-balancer-subnets: ${datahub_frontend_lb_subnets}
podAnnotations:
prometheus.io/scrape: 'true'
prometheus.io/path: '/metrics'
prometheus.io/port: '4318'
extraEnvs:
- name: EBEAN_DATASOURCE_USERNAME
value: "test/datahub/mysql/datahub-db/datahub"
- name: EBEAN_DATASOURCE_PASSWORD
value: "dummy-pass-not-used"
- name: EBEAN_DATASOURCE_HOST
value: "${mysql_server}:3306"
- name: EBEAN_DATASOURCE_URL
value: "jdbc-secretsmanager:mysql://${mysql_server}:3306/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8&enabledTLSProtocols=TLSv1.2"
- name: EBEAN_DATASOURCE_DRIVER
value: "com.amazonaws.secretsmanager.sql.AWSSecretsManagerMySQLDriver"
It works very well. It picks DB username and password from secret manager those are automatically updated when credentials in secret manager are rotated.
@shirshanka Please let me know if this MR can be reviewed and merged. Please let me know if any more work is needed from my side on this.
Running CI. I think the PR looks okay. Will go ahead and merge once CI passes.
@jjoyce0510 Please let me know if it can be merged now
@shirshanka
Thank you for accepting my MR.
About external jars,
I really do not know how you are going to implement this. But I did some testing.
I started datahub-gms container with sleep 7200
command so that I can login to the container. After SSHing to the container,
- I downloaded aws-secretsmanager-jdbc and all it's dependent libraries.
cd /home/datahub
curl -L -O http://search.maven.org/remotecontent?filepath=org/apache/ivy/ivy/2.5.0/ivy-2.5.0.jar
java -jar ivy-2.5.0.jar -dependency com.amazonaws.secretsmanager aws-secretsmanager-jdbc 1.0.8 -retrieve "/home/datahub/dependencies/lib/[artifact]-[revision](-[classifier]).[ext]"
- Then I tried to start datahub gms service with the following command. I added
--lib
option
java -javaagent:jmx_prometheus_javaagent.jar=4318:/datahub/datahub-gms/scripts/prometheus-config.yaml -jar /jetty-runner.jar --jar jetty-util.jar --jar jetty-jmx.jar --lib /home/datahub/dependencies/lib --config /datahub/datahub-gms/scripts/jetty.xml /datahub/datahub-gms/bin/war.war
But it did not work because of two reasons
- aws-secretsmanager-jdbc documentation mentions that real driver ("com.mysql.cj.jdbc.Driver") must be pre-registered (issue: 44). "com.amazonaws.secretsmanager.sql.AWSSecretsManagerMySQLDriver" is just a wrapper that under the hood uses "com.mysql.cj.jdbc.Driver".
- Jetty started reporting multiple versions. Gradle, at the time of packaging, resolves dependencies and download correct version of dependent library. But in my testing, I downloaded these dependencies externally, so it caused some multiple version issue.
I hope this information helps datahub team when they implement external jars functionality.