crystal [performance] add contextCache +150 r/s and reduce latency

I'm submitting a

[x] feature request

PostGraphile version: 4.4.4

Current behavior:

Currently no cache for PostGraphileContext. Each time create PostGraphileContext take about 0.8~1.2ms, some times this step even slow then real PG query request which may 0.4~0.6ms. So if create a LRU cache for recent access user, it can improve performance and latency.

ENV: CPU i7 4790K, single nodejs without cluster, single Postgresql 11 DB on the same machine.

current performance:

Finished 10000 requests
Concurrency Level:      10
Time taken for tests:   5.348 seconds
Complete requests:      10000
Failed requests:        0
Keep-Alive requests:    10000
Total transferred:      7130035 bytes
Total body sent:        6350000
HTML transferred:       2160000 bytes
Requests per second:    1869.72 [#/sec] (mean)
Time per request:       5.348 [ms] (mean)
Time per request:       0.535 [ms] (mean, across all concurrent requests)
Transfer rate:          1301.87 [Kbytes/sec] received
                        1159.45 kb/s sent
                        2461.32 kb/s total

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     3    5   0.6      5      14
Waiting:        3    5   0.6      5      14
Total:          3    5   0.6      5      14

Percentage of the requests served within a certain time (ms)
  50%      5
  66%      5
  75%      5
  80%      6
  90%      6
  95%      6
  98%      7
  99%      7
 100%     14 (longest request)

Expected behavior:

add cache for PostGraphileContext

after add cache performance:

Finished 10000 requests
Concurrency Level:      10
Time taken for tests:   4.926 seconds
Complete requests:      10000
Failed requests:        0
Keep-Alive requests:    10000
Total transferred:      7130016 bytes
Total body sent:        6350000
HTML transferred:       2160000 bytes
Requests per second:    2030.00 [#/sec] (mean)
Time per request:       4.926 [ms] (mean)
Time per request:       0.493 [ms] (mean, across all concurrent requests)
Transfer rate:          1413.47 [Kbytes/sec] received
                        1258.84 kb/s sent
                        2672.31 kb/s total

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     3    5   0.6      5      12
Waiting:        3    5   0.6      5      12
Total:          3    5   0.6      5      12

Percentage of the requests served within a certain time (ms)
  50%      5
  66%      5
  75%      5
  80%      5
  90%      5
  95%      6
  98%      6
  99%      7
 100%     12 (longest request)

Hack on node_modules\postgraphile\build\postgraphile\withPostGraphileContext.js:

//roytan
//beginning
const lru_1 = require("@graphile/lru");
const crypto = require("crypto");
const { createHash } = crypto;
const _contextCache = new lru_1.default({
    maxLength: 2000,
});
//......in function withDefaultPostGraphileContext

    const operationType = operation != null ? operation.operation : null;
    //roytan
    let lastArgs = JSON.stringify({
        jwtToken,
        jwtSecret,
        jwtAudiences,
        jwtRole,
        jwtVerifyOptions,
        pgDefaultRole,
        pgSettings,
    })
    let lastHash = createHash('sha1').update(lastArgs).digest('base64');
    let hashObj = _contextCache.get(lastHash)
    if (!hashObj) {

         //......
        const needTransaction = pgForceTransaction ||
                !!sqlSettingsQuery ||
                (operationType !== 'query' && operationType !== 'subscription');
        //roytan
        _contextCache.set(lastHash, {
            pgRole,
            jwtClaims,
            sqlSettingsQuery,
            needTransaction,
        });
    }
    else {
        // console.log(`withPostGraphileContext-144 get from _contextCache: `, )
    }
    hashObj = _contextCache.get(lastHash)
    const {pgRole, jwtClaims, sqlSettingsQuery, needTransaction} = hashObj

Nov 05 '19 06:11 roytan883

Which part of the context do you feel is expensive? I'd expect it to be the provisioning of the PostgreSQL client, which cannot be safely re-used as it must have its transaction cleared and be returned to the pool. Let me know if it's not the client you're referring to - always interested in performance improvements!

Nov 06 '19 12:11 benjie

@benjie in withDefaultPostGraphileContext this part:

const { role: pgRole, localSettings, jwtClaims } = getSettingsForPgClientTransaction({
    jwtToken,
    jwtSecret,
    jwtPublicKey,
    jwtAudiences,
    jwtRole,
    jwtVerifyOptions,
    pgDefaultRole,
    pgSettings,
  });

  const sqlSettings: Array<sql.SQLQuery> = [];
  if (localSettings.length > 0) {
    // Later settings should win, so we're going to loop backwards and not
    // add settings for keys we've already seen.
    const seenKeys: Array<string> = [];
    // TODO:perf: looping backwards is slow
    for (let i = localSettings.length - 1; i >= 0; i--) {
      const [key, value] = localSettings[i];
      if (!seenKeys.includes(key)) {
        seenKeys.push(key);
        // Make sure that the third config is always `true` so that we are only
        // ever setting variables on the transaction.
        // Also, we're using `unshift` to undo the reverse-looping we're doing
        sqlSettings.unshift(sql.fragment`set_config(${sql.value(key)}, ${sql.value(value)}, true)`);
      }
    }
  }

  const sqlSettingsQuery =
    sqlSettings.length > 0 ? sql.compile(sql.query`select ${sql.join(sqlSettings, ', ')}`) : null;

  // If we can avoid transactions, we get greater performance.
  const needTransaction =
    pgForceTransaction ||
    !!sqlSettingsQuery ||
    (operationType !== 'query' && operationType !== 'subscription');

which can be cached. Those work calc sqlSettings beofre init PG client, sometimes it take more time than real PG client network request.

Nov 09 '19 05:11 roytan883

How many settings do you have in pgSettings/your JWTs/etc.

Nov 09 '19 15:11 benjie

@benjie except postgraphile default jwt fields, i only add one uid field, string, about 20 bytes

Nov 10 '19 11:11 roytan883

Hmmm. Okay cool I'll have a go at optimising this code. An LRU is an acceptable solution, but there might be some cheaper wins here, and wins that can be applied to the rest of the codebase potentially. Thank you for bringing this to my attention :raised_hands:

Nov 11 '19 10:11 benjie

I stumbled upon this part as well when reading through the code. I think localSettings should simply become a Map instead of that array of tuples - that would allow to just overwrite settings instead of appending, and eliminate the need for this weird backward-looping and testing for seenKeys. I can make a PR if you want.

Nov 12 '19 12:11 ab-pm

Sure; if you can ensure the code produces the same result in all cases (even when there's a mixture of overlapping inputs from multiple sources) I'm totally up for you changing it to a Map. When Maps were first introduced they were incredibly slow but the geniuses at v8 have now massively improved their performance and now they're sometimes even faster than the traditional objects/etc. Node 8 ends LTS at the end of December so we're starting with a Node 10 base in the new year which is fantastic :tada:

Nov 13 '19 11:11 benjie

(I'd like to see benchmarks to prove this is faster before merging nonetheless.)

Nov 13 '19 11:11 benjie

This issue is ancient and unlikely to still be relevant.

Oct 04 '23 18:10 benjie

crystal crystal copied to clipboard

[performance] add contextCache +150 r/s and reduce latency

crystal
crystal copied to clipboard