kafka-connect-http icon indicating copy to clipboard operation
kafka-connect-http copied to clipboard

Added support for auth token endpoint

Open martimors opened this issue 4 years ago • 9 comments

This adds the TokenEndpointAuthenticator class, which can be used to retrieve an auth token by making a POST-request to an HTTP endpoint before each poll. It is not the most efficient approach possible, as these types of tokens have a life span which is typically far larger than the desired polling interval (some providers such as Okta charge per. token too), but at least it is a start and it can be expanded later.

These config attributes were added:

  • "http.auth.url": The full URL for the token endpoint
  • "http.auth.body": Payload for the token request (ie. some escaped json)
  • "http.auth.tokenkeypath": Where in the returned payload to retreive the token from

Here is an example of a connector which also shows the three new config parameters added:

{
    "name": "sertica-voyage-reports.http.source",
    "config": {
        "connector.class": "com.github.castorm.kafka.connect.http.HttpSourceConnector",
        "tasks.max": "1",
        "http.offset.initial": "timestamp=2020-05-08T07:55:44Z",
        "http.request.url": "{{url}}/VoyageReports/search",
        "http.request.headers": "accept: text/plain,Content-Type: application/json",
        "http.request.body": "{  \"distinct\": true,  \"includeSubObjects\": true,  \"criteria\": [    {      \"field\": \"updateDate\",      \"operator\": \"GreaterThan\",      \"values\": [        \"${offset.timestamp?datetime.iso}\"      ]    }  ],  \"sorting\": [    {      \"field\": \"updateDate\",      \"direction\": \"Asc\"    }  ]}",
        "http.request.method": "POST",
        "http.auth.type": "Token_Endpoint", # Token_Endpoint option was added in this PR
        "http.auth.url": "http://dkcph-pmstapp1.dk.dfds.root:9001/Auth", # This was added in this PR
        "http.auth.body": "{\"login\": \"mamor\", \"password\": \"{{password}}\"}", # This was added in this PR
        "http.auth.tokenkeypath": "accessToken", # This was added in this PR
        "http.response.list.pointer": "/",
        "http.response.record.offset.pointer": "key=/voyageReportNo, timestamp=/updateDate",
        "http.response.record.timestamp.parser.zpme": "UTC",
        "http.response.record.timestamp.parser": "com.github.castorm.kafka.connect.http.response.timestamp.NattyTimestampParser",
        "http.timer.interval.millis": "30000",
        "http.timer.catchup.interval.millis": "1000",
        "kafka.topic": "sertica-voyage-reports-cdc"
    }
}

I have also added a dev-dependency (mockhttpserver) to be able to test this functionality with okhttp calls mocked.

What do you think?

martimors avatar Jun 28 '21 11:06 martimors

@dingobar I love the changes in this PR. Something that could be added to address the life span of the token is for example

  • add http.auth.tokenexpirypath to define the place to get the lifespan of the token
  • use some cache in the OkHttpClient to cache the accessToken

what do you think @dingobar?

salimane avatar Aug 11 '21 06:08 salimane

In order:

  • I think this is a bit complicated, as there is no consistent way that this is implemented. Some times, it can only be read from the JWT claims, some times the validity end timestamp is obtained and some times the number of seconds until expiry. Therefore, I don't think this should be part of this particular feature. We can discuss it separately, because whatever solution we land on most likely won't create any breaking changes.
  • This is a great idea, but it requires us to also be aware of what to do when we try to use an expired token, ie. the API returns a 401 or something. I know it's wasteful to create a new token for every single request, but I think its a good way to keep the feature general enough to cover all potential use-cases. It requires further refinement and discussion in my opinion, and since it won't break the current setup it could wait.

For now, it would be nice if I could get the current feature set reviewed, perhaps with backwards compatibility for the above ideas in mind. The features implemented in this PR covers our use-case at our company, and I have only limited resources for further commitments unfortunately.

What do you think?

martimors avatar Aug 16 '21 10:08 martimors

@castorm Any chance we could revive this project somehow? I think it's an awesome connector and it will have many use-cases in our organization.

martimors avatar Sep 17 '21 10:09 martimors

  • I think this is a bit complicated, as there is no consistent way that this is implemented. Some times, it can only be read from the JWT claims, some times the validity end timestamp is obtained and some times the number of seconds until expiry. Therefore, I don't think this should be part of this particular feature. We can discuss it separately, because whatever solution we land on most likely won't create any breaking changes.

What about letting the user set a fixed TTL on the token? In our case I know the token will be valid for one hour so I wouldn't have a problem using that.

lobbin avatar Sep 24 '21 10:09 lobbin

pr-142.diff.txt

Please see attached diff. It's an initial go at caching the access token for a fixed time. The token is saved for "http.auth.tokenexpirytime" seconds. The default value of 0 continues the current behaviour or fetching a new token on every request.

Not sure if it's the proper way of storing the data locally in private properties, but at least it works ;)

I also took the liberty ot changing how to find the token from path() to at(), given that our token result looks like this

{
  "data": {
    "tokens": {
      "accessToken": "<token data">
    }
  }
}

lobbin avatar Oct 18 '21 13:10 lobbin

@lobbin Fixed TTL for the token given in the connector config is fine. However, seeing as this project is pretty dead the past months (no master branch commits since june 16th this year), I won't spend any more time on this until I get a reply from one of the maintainers of this repo. We are not wanting to support our fork of this project and just wanted to introduce some new functionality, so if the community is dead that unfortunately means we cannot use this software for now...

martimors avatar Oct 18 '21 13:10 martimors

@dingobar Yes, I saw that as well. However, I haven't found any other suitable connector so I'm stuck with this one for a while longer.

lobbin avatar Oct 18 '21 14:10 lobbin

Hi @dingobar and @lobbin,

Thank you very much for your contributions.

I'm sorry it took me a bit to reply, I wouldn't say this project is dead, but I definitely don't have the time or incentive to spend too much time on it, so I'd be happy to accept new maintainers if you are up for the task.

Regarding this particular contribution, I'd love to merge it, as this feature is a recurrent request.

However, I'd like for a couple of things to be considered before moving forward, and I perfectly understand if you wouldn't want to invest any minute on it.

In any case, those things are:

  • There were already some integration tests starting up kafka & kafka connect servers. I believe the approach should be unified (preferably continue using testcontainers)
  • This plugin's HttpAuthenticator is integrated as an OkHttp's Authenticator. That means it's called automatically after receiving a 401 status code, and request retried after token refreshed. There might be some flaws in the current implementation preventing us from taking advantage of this, but before merging this PR, I'd like for this to be reviewed. Meaning, IMO Authenticator should only be called after a 401 to refresh the token (or as an optimization on first request as well), for the rest of the requests, last token should be reused. (Might be worth checking this out: https://sangsoonam.github.io/2019/03/06/okhttp-how-to-refresh-access-token-efficiently.html).

Again, thank you very much for your contribution, I believe even if the project is not as active as it used to be, it's still used by some people, and I'm sure this feature would be very much welcome. So I'll be happy to hear back from you.

Best regards.

castorm avatar Oct 24 '21 18:10 castorm

Thanks for the reply - I'll see if I can implement those changes.

About your time and the state of this repo, I agree that this is a very nice plug-in to connect and it will have immense value for us if we manage to implement it. Perhaps you could ask some of those people who you know use it and have contributed to it in the past to help you maintain it? I think the most crucial thing is being able to merge bugfixes and security patches and release them to confluent hub. Perhaps a conversation for another issue.

martimors avatar Oct 26 '21 09:10 martimors

I understand this plugin has gone a bit stale, but this would definitely be a welcome feature.

I unfortunately no longer use Kafka Connect. I welcome you to take over the feature and implement the requested changes from @castorm .

martimors avatar Nov 04 '22 08:11 martimors

I would like to try this or perhaps complete it if possible.

szalapski avatar May 08 '23 19:05 szalapski

@dingobar or @castorm, can you give guidance on how I might help? Above you say you want integration tests as well as adaptation to work with OkHttp's Authenticator.

I'm not sure I can get my environment set up to build or run integration tests.

Also, for what it's worth, I think fetching a token perhaps should be done preemptively, not having to get a 401 before doing so. If Token_Endpoint is the mode but we have no token, my vote is we should go get one before attempting the API call.

szalapski avatar May 08 '23 21:05 szalapski