crystal
crystal copied to clipboard
[performance] add contextCache +150 r/s and reduce latency
I'm submitting a
- [x] feature request
PostGraphile version: 4.4.4
Current behavior:
Currently no cache for PostGraphileContext
. Each time create PostGraphileContext take about 0.8~1.2ms
, some times this step even slow then real PG query
request which may 0.4~0.6ms
. So if create a LRU cache for recent access user, it can improve performance and latency.
ENV: CPU i7 4790K, single nodejs without cluster, single Postgresql 11 DB on the same machine.
current performance:
Finished 10000 requests
Concurrency Level: 10
Time taken for tests: 5.348 seconds
Complete requests: 10000
Failed requests: 0
Keep-Alive requests: 10000
Total transferred: 7130035 bytes
Total body sent: 6350000
HTML transferred: 2160000 bytes
Requests per second: 1869.72 [#/sec] (mean)
Time per request: 5.348 [ms] (mean)
Time per request: 0.535 [ms] (mean, across all concurrent requests)
Transfer rate: 1301.87 [Kbytes/sec] received
1159.45 kb/s sent
2461.32 kb/s total
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 3 5 0.6 5 14
Waiting: 3 5 0.6 5 14
Total: 3 5 0.6 5 14
Percentage of the requests served within a certain time (ms)
50% 5
66% 5
75% 5
80% 6
90% 6
95% 6
98% 7
99% 7
100% 14 (longest request)
Expected behavior:
add cache for PostGraphileContext
after add cache performance:
Finished 10000 requests
Concurrency Level: 10
Time taken for tests: 4.926 seconds
Complete requests: 10000
Failed requests: 0
Keep-Alive requests: 10000
Total transferred: 7130016 bytes
Total body sent: 6350000
HTML transferred: 2160000 bytes
Requests per second: 2030.00 [#/sec] (mean)
Time per request: 4.926 [ms] (mean)
Time per request: 0.493 [ms] (mean, across all concurrent requests)
Transfer rate: 1413.47 [Kbytes/sec] received
1258.84 kb/s sent
2672.31 kb/s total
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 3 5 0.6 5 12
Waiting: 3 5 0.6 5 12
Total: 3 5 0.6 5 12
Percentage of the requests served within a certain time (ms)
50% 5
66% 5
75% 5
80% 5
90% 5
95% 6
98% 6
99% 7
100% 12 (longest request)
Hack on node_modules\postgraphile\build\postgraphile\withPostGraphileContext.js
:
//roytan
//beginning
const lru_1 = require("@graphile/lru");
const crypto = require("crypto");
const { createHash } = crypto;
const _contextCache = new lru_1.default({
maxLength: 2000,
});
//......in function withDefaultPostGraphileContext
const operationType = operation != null ? operation.operation : null;
//roytan
let lastArgs = JSON.stringify({
jwtToken,
jwtSecret,
jwtAudiences,
jwtRole,
jwtVerifyOptions,
pgDefaultRole,
pgSettings,
})
let lastHash = createHash('sha1').update(lastArgs).digest('base64');
let hashObj = _contextCache.get(lastHash)
if (!hashObj) {
//......
const needTransaction = pgForceTransaction ||
!!sqlSettingsQuery ||
(operationType !== 'query' && operationType !== 'subscription');
//roytan
_contextCache.set(lastHash, {
pgRole,
jwtClaims,
sqlSettingsQuery,
needTransaction,
});
}
else {
// console.log(`withPostGraphileContext-144 get from _contextCache: `, )
}
hashObj = _contextCache.get(lastHash)
const {pgRole, jwtClaims, sqlSettingsQuery, needTransaction} = hashObj
Which part of the context do you feel is expensive? I'd expect it to be the provisioning of the PostgreSQL client, which cannot be safely re-used as it must have its transaction cleared and be returned to the pool. Let me know if it's not the client you're referring to - always interested in performance improvements!
@benjie in withDefaultPostGraphileContext
this part:
const { role: pgRole, localSettings, jwtClaims } = getSettingsForPgClientTransaction({
jwtToken,
jwtSecret,
jwtPublicKey,
jwtAudiences,
jwtRole,
jwtVerifyOptions,
pgDefaultRole,
pgSettings,
});
const sqlSettings: Array<sql.SQLQuery> = [];
if (localSettings.length > 0) {
// Later settings should win, so we're going to loop backwards and not
// add settings for keys we've already seen.
const seenKeys: Array<string> = [];
// TODO:perf: looping backwards is slow
for (let i = localSettings.length - 1; i >= 0; i--) {
const [key, value] = localSettings[i];
if (!seenKeys.includes(key)) {
seenKeys.push(key);
// Make sure that the third config is always `true` so that we are only
// ever setting variables on the transaction.
// Also, we're using `unshift` to undo the reverse-looping we're doing
sqlSettings.unshift(sql.fragment`set_config(${sql.value(key)}, ${sql.value(value)}, true)`);
}
}
}
const sqlSettingsQuery =
sqlSettings.length > 0 ? sql.compile(sql.query`select ${sql.join(sqlSettings, ', ')}`) : null;
// If we can avoid transactions, we get greater performance.
const needTransaction =
pgForceTransaction ||
!!sqlSettingsQuery ||
(operationType !== 'query' && operationType !== 'subscription');
which can be cached. Those work calc sqlSettings beofre init PG client, sometimes it take more time than real PG client network request.
How many settings do you have in pgSettings/your JWTs/etc.
@benjie except postgraphile default jwt fields, i only add one uid
field, string, about 20 bytes
Hmmm. Okay cool I'll have a go at optimising this code. An LRU is an acceptable solution, but there might be some cheaper wins here, and wins that can be applied to the rest of the codebase potentially. Thank you for bringing this to my attention :raised_hands:
I stumbled upon this part as well when reading through the code. I think localSettings
should simply become a Map
instead of that array of tuples - that would allow to just overwrite settings instead of appending, and eliminate the need for this weird backward-looping and testing for seenKeys
. I can make a PR if you want.
Sure; if you can ensure the code produces the same result in all cases (even when there's a mixture of overlapping inputs from multiple sources) I'm totally up for you changing it to a Map. When Maps were first introduced they were incredibly slow but the geniuses at v8 have now massively improved their performance and now they're sometimes even faster than the traditional objects/etc. Node 8 ends LTS at the end of December so we're starting with a Node 10 base in the new year which is fantastic :tada:
(I'd like to see benchmarks to prove this is faster before merging nonetheless.)
This issue is ancient and unlikely to still be relevant.