knex icon indicating copy to clipboard operation
knex copied to clipboard

expirationChecker won't recreate connection object with Postgres client

Open dsbrgg opened this issue 5 years ago • 28 comments

Environment

Knex version: 0.20.3 Database + version: psql (PostgreSQL) 12.1 OS: macOS 10.14.6

Bug

The expirationChecker property on the connection object is not being called to recreate the connection object with new credentials for the database, even though the timeout was reached.

The first time that a request is being made, it works fine but after the timeout, it won't allow to make subsequent connections to the database returning a permission_denined error (as it should). Here's an example of the knex instance being made.

const database = knex({
  ...configuration,
  connection: async () => {
    const { timeout, ...credentials } = await vault.psqlCredentials('my-role');

    const connection = {
      ...configuration.connection,
      ...credentials,
      expirationChecker: () => timeout <= Date.now(),
      ssl: yn(get(configuration, 'connection.ssl', false))
    };

    return connection;
  }
});

It came to my attention that the psql config interface does not have an expirationChecker property in it but, I'm not sure if this has a direct impact with this situation since the function that would update the connection seems to be called on the first time.

dsbrgg avatar Dec 11 '19 12:12 dsbrgg

So, after a couple of digging, I think I was able to find the issue and at least provide some kind of solution, even if a naive one.

The connectionConfigExpirationChecker property is not being used except for the first time. That is happening because it's only being called on the first create call on tarn.js.

From what I've investigated, tarn.js does not have a property or something like that to destroy resources based on expiration.

Basically, I've just added a setTimeout on the updatePoolConnectionSettingsFromProvider and acquiring the resource on the pool to properly close the connection. This also implies that the expirationChecker function should return a value representing the delay in milliseconds to be passed to setTimeout.

I've done some local tests here with the codebase I'm working with and I finally had the behaviour I was expecting without any side-effects from what I can see.

@kibertoad , @oranoran , @Ali-Dalal - maybe you could give some insight in this since you guys were involved in the last PR related to this functionality.

dsbrgg avatar Jan 15 '20 17:01 dsbrgg

I actually don't like or understand why there even is expirationChecker attribute in connection. To me would be better if connection could be declared as a function which then may return config directly or it may return promise, which resolves config.

Then it would be up to client code to implement logic when it wants to get fresh config and when it wants to for example return the same already resolved promise for the old config.

Could someone explain me better how that async / function getting of connection parameters works nowadays or what am I missing here?

elhigu avatar Jan 20 '20 13:01 elhigu

@elhigu I believe that this expirationChecker was introduced along with the PR that I've mentioned, that also introduced the async connection.

I can't tell for every case but at least in my case, this property expirationChecker would be very handy because I need to get new credentials for the database after a set amount of time and I believe this way would be easier since knex could just end the resource on its own.

I believe that the async connection is actually being used as it should but the issue is that, there's in fact a property expirationChecker which seems to not be used at all, besides the first time that tarn.js actually creates the resource.

dsbrgg avatar Jan 21 '20 17:01 dsbrgg

@dsbrgg to be honest, I haven't tried the expirationChecker because, in my cause, It is not needed. But setting the connection config as an async function works like a charm.

@oranoran can you put your resolution on this?

Ali-Dalal avatar Jan 22 '20 02:01 Ali-Dalal

@dsbrgg is your expectation for the configuration to expire at the timeout, or for actual living connections that were created using it to expire? The intention was to implement the former not the latter. The implementation has nothing to do with tarn, it's part of the client code that creates new connections and isn't related to maintaining them once created.

oranoran avatar Jan 22 '20 07:01 oranoran

I can't tell for every case but at least in my case, this property expirationChecker would be very handy because I need to get new credentials for the database after a set amount of time and I believe this way would be easier since knex could just end the resource on its own.

That would be as easy or even easier if knex would just always reload config if it is changed. Then there would be no need for that kind of timer. So to me that parameter still looks like a hack to me instead of feature. I remember that async knex configuration feature first implementation was not very versatile for many use cases either (IIRC it only made possible to get async config once and then there was no way to update it afterwards)...

elhigu avatar Jan 22 '20 10:01 elhigu

@Ali-Dalal Yes, like I said, the async config works fine. My problem has been with the expirationChecker exclusively.

@oranoran I suppose both. Since the connection would have to expire after a timeout, the resource would have to be recreated or you would end up with an invalid knex instance - which is my case, after the timeout, I get invalid credentials errors.

Regardless, the expirationChecker is just being called on the resource creation. If this is what the property was actually intended to do, then I got it wrong. I thought it would be checking if the connection was still valid somehow.

@elhigu I'm sorry but I don't understand what you mean by "reloading the config if it's changed". The timer was just a naive approach to the matter as I was under the impression that it was supposed to verify if the connection was valid from time to time or before requests were made and then call the async connection function again.

dsbrgg avatar Jan 22 '20 11:01 dsbrgg

I'm sorry but I don't understand what you mean by "reloading the config if it's changed".

I mean something like this:

let config = {};
setInterval(() => {
  config = generateNewConfig();
}, 5000); 

{
  client: 'pg',
  connection: () => {
     return config;
  }
}

That would cause knex to reload every 5 seconds, since returned configuration object is changing every 5 seconds (of course connection callback could also return a promise which would be asynchronously resolved).

I was under the impression that it was supposed to verify if the connection was valid from time to time or before requests were made and then call the async connection function again.

I'm pretty sure that it does not check if the connection details are valid, but only re-reads new config with that interval.

elhigu avatar Jan 22 '20 12:01 elhigu

@elhigu Thanks for the example. I guess this could do it.

I'm pretty sure that it does not check if the connection details are valid, but only re-reads new config with that interval.

I think this is not actually happening. At least from what I've tested, even after the expiration was passed, connection was not being called again to re-read the config/update the config details.

That's what I mean when I say that the connection is valid or not, since the expiration, for me, means that at that point that the connection would no longer be valid.

It is called once when the connection is first being created and never again to see if the expirationChecker will return true. If that's the intended behaviour than my bad, I misunderstood the point of the property.

Anyway, I'll try something around that example you've showed.

dsbrgg avatar Jan 22 '20 14:01 dsbrgg

Anyway, I'll try something around that example you've showed.

I don't think that my example will work at all, since it is an imaginary API how I would have wanted that configuration reloading feature to work.

elhigu avatar Jan 24 '20 11:01 elhigu

Oh sorry, I thought you meant as if that could solve it.

Could this function be called upon the validation of a connection?

This way the connection could be destroyed and also the connection would be rebuilt at that point. This way it would prevent any type of timer.

I suppose the property connectionConfigExpirationChecker is set to true by default on initialisation, so maybe a variable that would contain the function for expirationChecker and will be checked and called upon validate, maybe something along these lines. I've tried some tests here and it worked as well.

dsbrgg avatar Jan 24 '20 17:01 dsbrgg

Maybe... more people should read through those connection parts of the code base to make sure that it is a valid solution... I don't know when I would have time to concentrate on that.

elhigu avatar Jan 24 '20 20:01 elhigu

I don't need this immediately as I came up with this issue for an upcoming feature on my project but it would be great if this could be implemented. Thanks for the responses as well @elhigu!

dsbrgg avatar Jan 27 '20 07:01 dsbrgg

Hey, digging this issue up again. I have pretty much the same use-case with rotating creds. Currently the only reliable way I've found to expire connections using stale creds is to .destroy() and .initialize() the connection pool.

I think the proposed solution three comments above is a very reasonable one, and shouldn't introduce any unexpected new logic. If expirationChecker returns true, the implication there is that the configuration which was loaded is no longer valid, and I'm not sure if there is a situation in which that would not mean connections using that config are no longer valid.

Let's turn this into a PR?

sprusr avatar Jun 30 '21 07:06 sprusr

@sprusr I don't like that expiration checker way of implementing it. It is really limited and seems hacky. Just allowing that the configuration is a callback function returning configuration object or a promise resolving to configuration object should be enough for knex to know when the configuration is changed and requires reload (just simply by comparing identity of old configuration and the one returned from callback).

What is needed to make this to PR would be to describe clearly what pool will do when configuration is changed and there are still old connections with old configuration alive. For example ongoing transactions are like that. What would be wanted behavior in that case.

elhigu avatar Jun 30 '21 12:06 elhigu

@elhigu not sure I agree that the expirationChecker is limited or hacky, but certainly comparing what is returned/resolved from connection callback would also work nicely. Although I expect that currently in many uses of Knex, people are doing quite heavy lifting in that async connection function, expecting it to be called very rarely or even only once. In that respect this could be considered a breaking change of sorts.

Great point about transactions, that throws something interesting into the mix. I suppose expected behaviour would be to continue using the same connection for the transaction. Then when that connection returns to idle, it would be destroyed. There is some concept of being "idle" in tarn I believe, maybe that can help here?

Or maybe it's much more complicated to dictate how "expired" connections should behave.

sprusr avatar Jun 30 '21 12:06 sprusr

Knex, people are doing quite heavy lifting in that async connection function, expecting it to be called very rarely or even only once. In that respect this could be considered a breaking change of sorts.

If there is too much heavy lifting done, then they should add cache to it.

In that respect this could be considered a breaking change of sorts.

I thought knex didn't allow passing function to connection object yet. If it already is like that then yeah, it will be a breaking change, but still easy to fix with any memoize helper.

elhigu avatar Jun 30 '21 14:06 elhigu

I thought knex didn't allow passing function to connection object yet

There's a section under here, 8th code example down: https://knexjs.org/#Installation-client

Just to give as much context as possible: that example is also again the use-case which I'm working with here, that the creds expire after some time. The trouble with the example is that open connections continue to be used, and the way that my credentials are expired are just that privileges for that user are revoked and connection is not explicitly dropped by the server (I believe this is quite normal behaviour). The responsibility is on the client to stop using the expired connections (preferably before the creds have actually expired) and start using connections with newer creds.

If there is too much heavy lifting done, then they should add cache to it

Yeah sure if that connection function is behaving in the way proposed here. The trouble is that the current behaviour is that this function is called only when a new connection pool is being created, or when expirationChecker has returned true when creating a new connection. It's even recommended in that example above to do things like asynchronous fetching of creds. So people are doing their expensive operations in there.

The reason I think the better choice here for solving this use-case is the expirationChecker is because it's already there, recommended in documentation, and implementations using it in its current form would not break if it was changed to be called more often (before connections are acquired from the pool, as well as before they are created). Avoiding a breaking change is definitely a nice bonus here.

What I'm interested in is why a call to expirationChecker wasn't added to the pool's validate function in #3497 in the first place. If this was a deliberate decision, or if there's something I'm missing as to why it's a bad idea. Or perhaps I misunderstand the scope of what expirationChecker is intended to be used for.

By the way, thanks for engaging with this after I dug it up after a long time. I wasn't expecting to get responses so soon!

sprusr avatar Jul 01 '21 06:07 sprusr

The reason I think the better choice here for solving this use-case is the expirationChecker is because it's already there, recommended in documentation,

Right. Yeah now I get it and makes sense to use that ready made feature.

Though I'm a bit sad that the dynamic configuration thing was implemented that way...

What I'm interested in is why a call to expirationChecker wasn't added to the pool's validate function in #3497 in the first place. If this was a deliberate decision, or if there's something I'm missing as to why it's a bad idea. Or perhaps I misunderstand the scope of what expirationChecker is intended to be used for.

Probably it was fine for their use-case to be using old connections made with old credentials when the feature was written. And I 'm still not sure if it should be there always of is that also something opinionated functionality, that for some people it should be there and for others it is not necessary.

One should be able to give a custom validation function for tarn in pool's config.

elhigu avatar Jul 02 '21 11:07 elhigu

@elhigu Hello I'm getting this exact issue in my project and it's a blocker for me. The proposal solution seems to be exact what you are mentioning in your comment give a custom validation function for tarn in pool's config.

Proposal idea adds expirationChecker in validate function and this is passed to tarn config. It's working well for me. Can you explain more?

validate: (connection) => {
        if (isConnectionExpired()) {
          this.logger.warn(`Connection Expired`);
          return false;
        }

        if (connection.__knex__disposed) {
          this.logger.warn(`Connection Error: ${connection.__knex__disposed}`);
          return false;
        }

        return this.validateConnection(connection);
      },

toanvc avatar Aug 20 '21 16:08 toanvc

Right now we are blocked from using RDS with IAM authentication due to this bug. expirationChecker is in the Knex documentation but currently does nothing at all, how can we get a fix merged in?

julrich1 avatar Aug 20 '21 16:08 julrich1

@julrich1 Best solution would be implementing support for custom validation function in https://github.com/vincit/tarn.js/ Second best would be making knex.js implementation more flexible.

PRs for either approach would be most welcome.

kibertoad avatar Aug 20 '21 17:08 kibertoad

@julrich1 Best solution would be implementing support for custom validation function in https://github.com/vincit/tarn.js/ Second best would be making knex.js implementation more flexible.

PRs for either approach would be most welcome.

It looks like this proposal https://github.com/dsbrgg/knex/blob/connection-timeout/lib/client.js#L266-L309 does indeed pass the expirationChecker to the Tarn validate function, I'd be happy to work on an MR but it seems like @elhigu had some concerns with the approach.

julrich1 avatar Aug 20 '21 17:08 julrich1

@julrich1 Is there a way to implement it in a way that would be completely opt-in and does not affect people who don't need it?

From my understanding the main concern was this: And I 'm still not sure if it should be there always of is that also something opinionated functionality, that for some people it should be there and for others it is not necessary.

kibertoad avatar Aug 20 '21 17:08 kibertoad

The proposed change makes use of the existing expirationChecker field from the Knex configuration object (which currently does nothing) and only uses it if it is set, unless users are setting that expirationChecker property, there would be no behavior change for them.

julrich1 avatar Aug 20 '21 17:08 julrich1

@julrich1 Sounds like a good approach then.

kibertoad avatar Aug 20 '21 17:08 kibertoad

Resurrecting this thread again. Based on the discussion above it looks like the solution provided in this solution would be ideal fix. can we please get this reviewed and merged to the master, if its not already fixed in some other PR? This is indeed a blocker for a project I'm working on.

amarsiingh avatar Sep 19 '22 18:09 amarsiingh

@amarsiingh could you create a PR with that approach?

kibertoad avatar Sep 19 '22 18:09 kibertoad

Did someone find a way around this? We are having same issue with IAM token authentication (it expires every 15 mins). I've tried. Was the PR ever created/merged?

deepmetalpay avatar Jun 15 '23 15:06 deepmetalpay

We are still on knex 2.0.0 and the following is working like a charm:

return knex({
    client: "postgres",
    connection: async () => {
      const { token, expiresAt } = await getToken();

      return {
        host: `/cloudsql/${process.env.CLOUD_SQL_CONNECTION_NAME}`,
        database: process.env.CLOUD_SQL_DATABASE,
        user: account.split(".gserviceaccount.com")[0],
        password: token,
        expirationChecker: () => {
          const remainingTime = expiresAt.diff(DateTime.now());
          // Google caches tokens until they have less than 5 minutes of
          // remaining validity. So we keep tokens exactly that long:
          // https://cloud.google.com/compute/docs/access/create-enable-service-accounts-for-instances#applications
          return remainingTime <= Duration.fromObject({ minutes: 5 });
        },
      };
    },
    pool: {
      min: 5,
      max: 9, // our cloudSQL allows at most 19 connections
    },
  });

ploth avatar Jun 15 '23 16:06 ploth