ArcticDB icon indicating copy to clipboard operation
ArcticDB copied to clipboard

Certificate path detection is incorrect when using AWS STS (eg with AWS EKS)

Open poodlewars opened this issue 1 year ago • 7 comments

Describe the bug

The Azure we detect (using the openssl python lib) correct certificate locations, and use them in the Azure SDK.

For S3, we use the system default. But since we build on manylinux, and statically link libcurl and openssl, this means the "system default" we end up using is CentOS', which can lead to problems when running on other Linuxes.

Example failing flow:

  • Use passwordless authentication
  • Assume IAM role using AWS STS
  • The SSL verification with STS fails

On S3, we should use the same certificate location detection logic that we have for Azure.

There is a private thread that Alex Seaton can add you to about this with more context, https://arcticdb.slack.com/archives/C064NA7BK5H/p1701703865582509 .

poodlewars avatar Dec 05 '23 15:12 poodlewars

We should set up our own AWS EKS cluster so we can test this properly.

poodlewars avatar Dec 07 '23 10:12 poodlewars

Plan for the development

  1. Verify verifySSL works with S3
  2. Make EKS works
  3. Unit test them

phoebusm avatar Dec 21 '23 14:12 phoebusm

I've been setting up a test AWS EKS cluster to help with this. I've put the deployment files here: https://github.com/poodlewars/scratch/tree/k8s-files .

.
├── deployment.apps
│   └── eks-sample-linux-deployment.yaml
├── service
│   └── eks-sample-linux-service.yaml
└── serviceaccount
    └── k8s-svc2.yaml

On Amazon Linux, this works out of the box, as it has the RHEL style cert locations.

With an Ubuntu image, this fails. Interesting it still fails on a Conda based install (mamba) even though we dynamically link openssl there?

I've set up a cluster alex-cluster-fargate in our AWS. You can set up kubectl to point at it with these notes. https://docs.aws.amazon.com/eks/latest/userguide/create-kubeconfig.html .

This was mostly following the fargate version of these notes, https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html then applying the configs above.

poodlewars avatar Jan 02 '24 11:01 poodlewars

Test passes with Conda installation in centos and ubuntu. Possible reason is:

  1. libcurl and openssl are dynamically linked
  2. openssl in conda has specifeid a os-independent location for ca file, e.g. cafile='/root/miniforge3/envs/py310/ssl/cert.pem'

phoebusm avatar Mar 18 '24 03:03 phoebusm

At the moment, S3 SDK doesn't allow manually setting ca cert path for EKS, due to the corresponding settings is not being passed to it: https://github.com/aws/aws-sdk-cpp/blob/e9d0d247be909ade39f213a3e2915aa262755a78/src/aws-cpp-sdk-core/source/auth/STSCredentialsProvider.cpp#L110 I can think of three ways to fix the problem:

  1. Utilise the not-in-use https://github.com/man-group/ArcticDB/blob/9c91bd6c1100981dedc5b3772aca718f1f39f8eb/cpp/arcticdb/storage/s3/aws_provider_chain.cpp#L30 to design our own AWSCredentialsProvider derived, which supports setting ca path
  2. Make libcurl dynamically linked in pypi, as in conda
  3. Patch S3 SDK to support loading CA path setting. Easiest way is probably by setting some environment variable.

phoebusm avatar Mar 18 '24 03:03 phoebusm

We are going to dynamically linked libcurl in vcpkg build. A few things are needed to be done:

  • [ ] Which version openssl should we dynamically link to?
  • openssl only maintains ABI compatibility in major version
  • We use 1.1.1 in vcpkg and not specified in conda (currently >3)
  • [ ] How to strip openssl library in auditwheel
  • [x] Make vcpkg only dynamically linked openssl
  • [ ] Test whether the dynimic linkage can solve the problem we face in AWS STS

phoebusm avatar Mar 22 '24 13:03 phoebusm

We are going to dynamically linked libcurl in vcpkg build. A few things are needed to be done:...

The plan was scrapped as the decision is irreversible and maintainence debt for the future new version of openssl. Now we will stick to patching S3 SDK STS authentication class to make it follow the setting of caPath and caFIle in client config

https://github.com/aws/aws-sdk-cpp/issues/2920

phoebusm avatar Apr 11 '24 14:04 phoebusm