node-mysql2
node-mysql2 copied to clipboard
Performance: instantiate row from the same base class instead of starting from generic {} object
@testn we did have some initial discussion when the original change was made in #1402 but wanted to review and benchmark again. I think the fix should be simple, if the performance numbers are visible enough worth adding.
A bit of context: previously, generated parser class was used to create an instance of row object. V8 caches access to fields, so in theory doing
function row() {
this.field1 = 1;
this.field2 = 2;
// etc
}
new row() // repeat many times
is much faster then
function row() {
const result = {};
result.field1 = 1;
result.field2 = 2;
// etc
}
We could potentially generate some proxy RowConstructor class that assigns all fields from passed from the constructor.
For example, currently generated code for connection.query("show tables) is
(function () {
return class TextRow {
constructor(fields) {
}
next(packet, fields, options) {
this.packet = packet;
const result = {};
// "Tables_in_mysql": VAR_STRING
result["Tables_in_mysql"] = packet.readLengthCodedString("utf8");
return result;
}
};
})()
we could make it something like that?
(function () {
return class TextRow {
constructor(fields) {
this.RowConstructor = function(field1) {
// "Tables_in_mysql": VAR_STRING
this["Tables_in_mysql"] = field1;
}
}
next(packet, fields, options) {
this.packet = packet;
return new this.RowConstructor(
packet.readLengthCodedString("utf8")
);
}
};
})()
Simple benchmark shows 50-60x time difference in row construction, not sure how big the penalty is in the total mix but might be still visible ( constructing an object with 7 fields 10m times and then accessing all fields )
const parse1 = function (base) {
const result = {};
result.id = 1;
result.int_0 = base + 0;
result.int_1 = base + 1;
result.int_2 = base + 2;
result.int_3 = base + 3;
result.int_4 = base + 4;
result.int_5 = base + 5;
result.int_6 = base + 6;
return result;
};
const parse2 = function (base) {
this.id = 1;
this.int_0 = base + 0;
this.int_1 = base + 1;
this.int_2 = base + 2;
this.int_3 = base + 3;
this.int_4 = base + 4;
this.int_5 = base + 5;
this.int_6 = base + 6;
return this;
};
// benchmark 2 versions: 1) parse1, 2) parse2
const NUM_ITEMS = 10000000;
const benchmark = function () {
let results = new Array(NUM_ITEMS);
const start = Date.now();
let sum = 0;
for (let i = 0; i < NUM_ITEMS; i++) {
let result = parse1(i % 1000);
results[i] = result;
sum +=
result.id +
result.int_0 +
result.int_1 +
result.int_2 +
result.int_3 +
result.int_4 +
result.int_5 +
result.int_6;
}
const end = Date.now();
console.log(`parse1: ${end - start} ms; sum: ${sum}`);
results = new Array(NUM_ITEMS);
const start2 = Date.now();
sum = 0;
for (let i = 0; i < NUM_ITEMS; i++) {
let result = parse2(i % 1000);
results[i] = result;
sum +=
result.id +
result.int_0 +
result.int_1 +
result.int_2 +
result.int_3 +
result.int_4 +
result.int_5 +
result.int_6;
}
const end2 = Date.now();
console.log(`parse2: ${end2 - start2} ms, sum: ${sum}`);
};
for (var i = 0; i < 10; i++) {
benchmark();
}
parse1 runs approx 1600ms while parse2 approx 30ms
I think I'd prefer refactoring code back to reading packet directly in a generated row constructor and not calling the constructor with all fields passed as list of arguments
Benchmarking "select 1000 rows, each row has 1 to 1000 INT columns", y axis is "columns X 1000 per millisecond", the higher the better
Family of 3 graphs is [email protected], family below is current 3.4
Interestingly the difference is more noticeable on a single cpu box ( benchmarked is a hobby railway.app server -> free planetscale database ). When I test on my local laptop the difference is less prominent BUT cpu consumption for node process is 300-400%. My theory: V8 optimiser runs in a separate thread(s) and if there is enough room does not affect main JS thread much
I'll add a benchmark later with "interpreting" parser, e.i the one that does not have "generate a parser js code and eval it upfront" pass
Top graph ( best performance ) - [email protected] Middle - current Bottom, "static / interpreting parser" from #2099 branch. I was hoping to see an intersection and switch to a "static" parser when response contains large number of columns but it looks like current version performs better across the line ( and 2.3.0 performs even better )