kafka-connect-elasticsearch
kafka-connect-elasticsearch copied to clipboard
Support AWS Elasticsearch Auth
AWS Elasticsearch implements a custom signature method to authenticate users [0].
It would be nice to be able to use this connector to move data into AWS Elasticsearch clusters that require authentication.
[0] - https://aws.amazon.com/blogs/security/how-to-control-access-to-your-amazon-elasticsearch-service-domain/
This would be really useful. I'm wondering how to implement this without adding dependencies on a bunch of AWS libraries that most people will not need or want on their classpath.
Perhaps with a new property that specifies the class name of a request interceptor? Then this property could be populated with the classname of an AWS request interceptor (like this one: https://github.com/inreachventures/aws-signing-request-interceptor) which adds the required AWS authentication to the ES requests. Then, if you are using AWS's ES, you can drop the required jars into your classpath and specify the request interceptor config in your ES connector config. It's a little cumbersome, am open to other ideas.
I have forked this repo in order to add AWS request signing, but I would like to contribute a solution upstream so I don't need to maintain a separate fork just for AWS's auth stuff.
@thomasdziedzic @zzbennett Definitely seems like a good idea -- I think this will be a matter of exposing a few more configs that are specific to AWS and then wiring up the auth pieces. There's an example of how to do the auth steps in this Jest issue and https://github.com/confluentinc/kafka-connect-elasticsearch/pull/77 is working on adding basic authentication support. If anyone is interested in taking a stab, I'd be happy to guide development and review a PR!
@ewencp I'd be happy to take a stab at this. I've got the code already and it is running well so far in our prototype connect deployment. I'll just productionalize it a bit and put up a PR for discussion.
Okay, so I'm back to working on the ES connector. I've been mulling this over and although the modifications involved for supporting the AWS authentication are simple, implementing them in a "pluggable" way is somewhat trickier.
Inspired by the pluggable partitioners and formatters in the S3/HDFS connector, this is a possible solution:
Abstract the ES client logic. Currently the connector depends directly on the JestClient and the JestClientFactory. Rather than depending directly on the JestClient for executing ES requests, we could add an ESClient interface and a default implementation that will use the current JestClient logic. A config would be added containing the classname of the ESClient implementation, which would get instantiated using reflection. Most people would use the default for this config, but for people needing the AWS auth (or any kind of special logic around querying ES), they could plop an implementation of the ESClient on their classpath that provides the AWS authentication and change the ESClient classname config. The downsides are it requires a new config that most people won't need to touch, and handling pluggability this way can get a bit unwieldy. It does give users complete control over how the connector queries ES, which could be useful, like if they are doing something fancy like routing to different ES clusters.
Honestly though, for this particular issue it might make more sense to stand up a reverse proxy that will handle the authentication. AWS's ES can do IP based access control, so you could just set up a vanilla nginx reverse proxy and whitelist its IP. Or you could set the proxy up with this.
I guess it boils down to whether it is worth it to abstract the ESClient or not. If the ESClient abstraction makes sense for purposes besides AWS authentication, then handling authentication that way could be easier, otherwise, the reverse proxy is probably the way to go.
Has there been any updates on this issue?
Since creating this issue AWS released VPC based Elasticsearch clusters, which don't require the auth signing of requests so there isn't as much of a need for this feature anymore.
We're using secure elasticsearch on the PROD, and now we need to sink some topics on IT using Kafka connect (We've been doing it using Spark streaming). @zzbennett I think this feature is needed . How can i help? @ewencp @thomasdziedzic @jdsiddon @zzbennett so we can move forward with this PR .
@elarib What do you mean exactly by secure Elasticsearch? Is your Elasticsearch cluster deployed in AWS? And if so, is it deployed in a VPC? Or do you access it over the public internet
@zzbennett Yesterday, i created a pull request with a description of this Use case: https://github.com/confluentinc/kafka-connect-elasticsearch/pull/185 There is some use case to secure ES so we can have multitenancy capability, using ES xPack or Searchguard.
#216 implements basic auth via the JEST client. Does that satisfy this request? If so, we can close this issue.
Anyone working on a PR for this? Planning to do so myself if not...
We have a company policy that requires signing as per https://docs.aws.amazon.com/general/latest/gr/signature-v4-examples.html
Perhaps a fork specific to AWS elastic search to avoid adding AWS dependencies generally to this connector? Seems a bit heavyweight either way..
Hi, about adding AWS specific support for security I do agree with @joncourt approach here. AWS has a lot of specific options (including necessary dependencies) that are very specific for AWS.
Important bit, as already commented out by @joncourt, is that you should issue Signature Version 4 signed requests, basically wrapping all your interaction with the search engine. This operation is of no benefit for any other Elasticsearch installation.
Access control is done with IAM policies, basically allowing or denying HTTP verbs against Resources. This policies let you authorise based on identity but as well on source, etc. This is where both the Signature and the policies take the work of doing the authorisation, at less to my understanding.
From their blog:
A note about authentication, which applies to both types of policies: you can use two strategies to authenticate Amazon ES requests. The first is based on the originating IP address. You can omit the Principal from your policy and specify an IP Condition. In this case, and barring a conflicting policy, any call from that IP address will be allowed access or be denied access to the resource in question. The second strategy is based on the originating Principal. In this case, you are required to include information that AWS can use to authenticate the requestor as part of every request to your Amazon ES endpoint, which you accomplish by signing the request using Signature Version 4. Later in this post, I provide an example of how you can sign a simple request against Amazon ES using Signature Version 4.
I would recommend doing it in a way where people not using AWS does not have to carry a heavy way of AWS deps, for example using a fork.
As well we should not forget that Elasticsearch has support for the security x-packs, this is another way of adding security on top of it as well, but not just that, a fewer people but as well people use https://search-guard.com/ as security solution for elasticsearch.
All of this calls for me for a solution that is portable and let people use their module for security and auth.
I hope it makes sense.
Linking for anyone else who comes across this, but it looks like there's a PR for this now https://github.com/confluentinc/kafka-connect-elasticsearch/pull/330
The complication we ran into with trying to use Elasticsearch on AWS via the IP range restriction suggested above is that it also limits requests to the Kibana instance that AWS gives you out of the box. It might not be a big problem depending on your use-case but its worth it to note.
Hello All,
A bit curious. I am trying to pull data out of AWS MSK via connector to AWS ES. Can anyone throw some light as to how I can configure the signer or any other way to index to AWS ES.
PS : AWS MSK i am able to connect, just want some help to index to ES.