js-bson perf(NODE-6246): Significantly improve memory usage and performance of ObjectId

Description

This MR refactors ObjectId class to persist strings instead of a raw Buffer, which greatly reduces memory usage and improves performance of ObjectIds.

Why?

Buffer (or TypedArray) is inefficient at storing small amount of data
Buffer of size 12 takes 96 bytes in v8! ObjectId with buffer is 128 bytes each
string of size 24 takes 40 bytes. ObjectId with String is 72 bytes

Performance Improvements

Operation	ObjectId (ops/sec)	ObjectId STRING (ops/sec)	Relative Improvement (%)
new ObjectId()	10,398,719	11,263,765	8.32%
new ObjectId(string)	4,965,933	14,869,618	199.43%
deserialize	6,955,399	6,431,123	-7.54%
serialize	3,881,206	3,247,690	-16.32%
fromExtendedJSON	4,821,665	50,886,088	955.36%
toExtendedJSON	10,988,941	134,783,499	1126.54%
toHexString	10,899,295	131,767,192	616.63%
.equals(objectId)	39,653,912	44,413,312	12.00%
.equals(string)	7,691,853	55,222,764	617.94%
.isValid(string)	137,247	21,470,293	15543.49%

Real world use cases

In the benchmarks above you can see there are some performance regressions with the raw serialize/deserializing BSON. This is because we now need to convert buffer to/from string instead of just copying bytes, but this regression seems to be moot during "real world" benchmarks.

I believe the 2 most common use cases are

1. Retrieving documents from the DB and returning them to the user.
1. Creating a new document and persisting to the DB

Using the MFlix Sample Dataset "Comments" documents.

Operation	ObjectId (ops/sec)	ObjectId STRING (ops/sec)	Relative Improvement (%)
GET 100 docs (deserialize + stringily)	3,762	3,855	2.46%
Create new doc (create + serialize)	673,069	710,472	5.56%

Memory Improvements

With this change, retained memory usage is reduced by ~45% (ObjectId size decreases from 128 bytes to 72 bytes). I've also removed an extra buffer allocation during deserialization which reduces the strain on the buffer pool, reducing the likelihood of the internal buffer pool having to allocate another block of Buffer.poolSize.

By reducing the amount of memory that needs to be allocated we are further improving performance since garbage collection is quite expensive.

Example

Using the MFlix Sample Dataset "Comments" collection. Grabbing 1 million documents

const bef = process.memoryUsage().rss;
const docs = await comments.find({}, { limit: 1000000 }).toArray() // lets grab 1 million documents
const aft = process.memoryUsage().rss;
console.log(`Memory used: ${Math.floor((aft - bef) / 1024 / 1024)}MB`);

BEFORE: 654 MB used

AFTER: 343 MB used

-47% reduction in memory usage

cacheBuffer Option

Similar to the previous cacheHexString static variable, this MR adds cacheBuffer option that also persists the Buffer on the ObjectId to speed up some operations that require buffer, such as .id

Previous ObjectID w/ cacheHexString VS New ObjectId w/ cacheBuffer

Operation	bson (ops/sec)	bsonSTR (ops/sec)	Relative Improvement (%)
new ObjectId()	5,535,118	5,481,568	-0.97%
new ObjectId() + serialize	5,621,583	5,635,728	0.25%
new ObjectId(string)	3,794,209	5,363,446	41.36%
deserialize	3,788,455	6,131,230	61.85%
serialize	67,014,370	80,864,967	20.65%
toHexString	71,372,677	79,949,268	12.02%
equals	40,714,896	45,167,881	10.95%
equals-string	20,697,929	42,426,021	105.00%
getTimestamp	10,554,392	10,591,691	0.35%
createFromTime	6,479,506	5,801,031	-10.47%
isValid	128,998	20,637,794	15895.85%

What is changing?

Is there new documentation needed for these changes?

This MR deprecates cacheHexString which may need to be documented.

This MR adds cacheBuffer which may need to be documented

What is the motivation for this change?

We are been running into memory issues while pulling large Mongo result sets into memory, and after lots of memory profiling the issue seems to be related to ObjectId, specifically how memory inefficient Buffer/ArrayBuffer is when storing lots of small Buffers.

We were expecting an ObjectId to consume ~12bytes of memory (+ some overhead), but in reality this consumes 128 bytes per ObjectId (96 bytes for just the Buffer).

I opened an issue with the NodeJS Performance team but this appears to be working as designed: https://github.com/nodejs/performance/issues/173

Storing a string in Node/V8 is much more memory efficient since it's a primitive. A 24 character hex string only consumes 40 bytes of memory, AND it's much faster to serialize/deserialize.

Release Highlight

Fill in title or leave empty for no highlight

Double check the following

[x] Ran npm run check:lint script
[x] Self-review completed using the steps outlined here
[x] PR title follows the correct format: type(NODE-xxxx)[!]: description
- Example: feat(NODE-1234)!: rewriting everything in coffeescript
[x] Changes are covered by tests
[x] New TODOs have a related JIRA ticket

Jul 03 '24 04:07 SeanReece

You can create a new generate that outputs string, like this:

/**
 * Generate a 12 byte id in hex string used in ObjectId's
 *
 * @param time - pass in a second based timestamp.
 */
static generateString(time?: number): string {
  if ('number' !== typeof time) {
    time = Math.floor(Date.now() / 1000);
  }

  const inc = ObjectId.getInc();

  // set PROCESS_UNIQUE if yet not initialized
  if (PROCESS_UNIQUE === null) {
    PROCESS_UNIQUE = ByteUtils.randomBytes(5);
  }

  if (PROCESS_UNIQUE_START_SLICE_HEX === null) {
    PROCESS_UNIQUE_START_SLICE_HEX = ByteUtils.toHex(PROCESS_UNIQUE);
  }

  let finalId = (time % (2 ** 32)).toString(16);
  finalId += PROCESS_UNIQUE_START_SLICE_HEX;
  finalId += (inc & 0xffffff).toString(16).padStart(6, '0');

  return finalId;
}

From my tests, it's slower than using buffer only but faster if you need to convert the buffer to hex:

<Buffer 7b 5b 0f 3e b7 ca ef 8a 6b 08 ca 28>
Time generate buffer: 348.585796 ms
7b5b0f3eb7caef8a6ba160a8
Time generate buffer.toString('hex'): 1175.228544 ms
7b5b0f3eb7caef8a6ba160a8
Time generateString: 780.711906 ms

Benchmark:

const { performance } = require('perf_hooks');
const jsBson = require('./lib/bson.cjs');

let start = performance.now();
let r;

const now = Date.now();

for (let i = 0; i < 1e7; i++) {
  r = jsBson.ObjectId.generate(now);
}

console.log(r);
console.log(`Time generate buffer: ${performance.now() - start} ms`);

start = performance.now();

for (let i = 0; i < 1e7; i++) {
  r = jsBson.ObjectId.generate(now).toString('hex');
}

console.log(r);
console.log(`Time generate buffer.toString('hex'): ${performance.now() - start} ms`);

start = performance.now();

for (let i = 0; i < 1e7; i++) {
  r = jsBson.ObjectId.generateString(now);
}

console.log(r);
console.log(`Time generateString: ${performance.now() - start} ms`);

Jul 04 '24 01:07 H4ad

Hey folks, just wanted to drop a heads-up that you won't hear back properly for a bit due to the holiday 🎆 but thanks so much for reaching out about this issue! @SeanReece This looks really thoroughly researched I am looking forward to getting familiar with the details!

Jul 04 '24 02:07 nbbeeken

@H4ad this is great! I was hesitant to touch generate but I like what you’ve done here. I’ll try to roll it into this change tomorrow and update the benchmarks. You also got me thinking we could use something like parseInt(__id.substring(0,8), 16) to get the time out of the hex string instead of using buffer. If that works and is more performant then we could drop the cacheBuffer option entirely. 🤔

Jul 04 '24 02:07 SeanReece

You also got me thinking we could use something like parseInt(__id.substring(0,8), 16) to get the time out of the hex string instead of using buffer.

Yeah, this will work new Date(parseInt(this.__id.substring(0,8), 16) * 1000).

If that works and is more performant then we could drop the cacheBuffer option entirely.

As a compatibility layer, I think keeping the buffer is still worthy to have, in case there is someone relying in something that can only be done via buffer.

Jul 04 '24 12:07 H4ad

@nbbeeken I've marked the PR ready for review again 👍 Looking forward to your input on this. Let me know if you have any questions.

Jul 16 '24 05:07 SeanReece

@nbbeeken I really appreciate the thorough review 👍 Sorry about all the test failures. I think I got so caught up with benchmarking that completely slipped my mind. I'll get the tests running!

Got any tips to get the BSON benchmarks running on my machine? Even on a clean main, check:granular-bench seems to chug for a bit then spits out "Error: Benchmarks did not run successfully", and check:spec-bench just exits without completing. I would love to be able to test with the official benchmarks.

I do expect there to be some regressions with raw serialization/deserialization since we're now converting/encoding a Buffer to a hex string, instead of just copying bytes, so running a benchmark that just serializes/deserializes is going to compound this. I still believe that through real-life use cases (combinations of deserialization+toJSON, etc), we shouldn't see any performance regressions. AND I'm sincerely hoping that we'll see much better hex encoding performance in V8 once this lands: https://github.com/tc39/proposal-arraybuffer-base64

While I was originally brainstorming this change I also played with the idea of having a large buffer to store a pool of objectIds, since the memory inefficiency comes from storing many small buffers. This would mean we wouldn't have any performance regressions from encoding strings on serialization/deserialization, BUT handling garbage collection / trimming the pool is definitely not an easy problem to solve.

Another idea is to have the string representation be configurable as a flag (ie. enableLowMemoryOID) which would trigger the use of string based ObjectIds, but this means a lot more branches, tests, and code to manage.

Jul 23 '24 22:07 SeanReece

Got any tips to get the BSON benchmarks running on my machine?

There's a small bug relating to handling paths I think that makes the benchmarks only run on linux.

I am able to get it to run on my Mac with the following command:

env LIBRARY=$(pwd) ITERATIONS=1000 WARMUP=1000 npm run check:granular-bench

Then I wrote a quick script to merge two test/bench/etc/resultsCollectedMeans.json files and get the percent diff. I have a clean main and this branch checked out in two separate folders, so I can just change the env variable to get results from each.

I do expect there to be some regressions with raw serialization/deserialization since we're now converting/encoding a Buffer to a hex string, instead of just copying bytes, so running a benchmark that just serializes/deserializes is going to compound this.

That is understandable, right now our benchmarks are geared towards throughput, hence the mb/s units, we're testing to make sure that we continue to be able to process the same amount of data as we have previously. This does focus solely on DB interaction side of things and does not measure the throughput of a total application that maybe serializing everything to JSON after it leaves the driver. Possibly that is an area we should look at to see if it is seriously impacted.

While I was originally brainstorming this change I also played with the idea of having a large buffer to store a pool of objectIds, since the memory inefficiency comes from storing many small buffers

Isn't this what allocUnsafe does (exposed through ByteUtils.allocateUnsafe)? And if I understand it correctly, it has a large ArrayBuffer that it hands our slices of, and when it runs out it starts a new one (not extending the old one) when all the slices (views) of that old pool leave scope, so can the old large pool. Maybe we can make sure ObjectId is using allocUnsafe wherever it can?

Another idea is to have the string representation be configurable as a flag (ie. enableLowMemoryOID)

Yes this is a possibility, but I agree with the sentiment of branching code, I think we should and can find an acceptable middle ground between performance and memory rather than that 🚀

Jul 24 '24 15:07 nbbeeken

@nbbeeken I'm also an a Mac (M1). Still no luck running the benchmarks but I'm adding some debug logs to see if I can track it down 🤔

Isn't this what allocUnsafe does

Yes NodeJS Buffer uses an internal pool to pre-allocate a Buffer of size Buffer.poolSize for performance reasons only, so that calls to allocateUnsafe are faster, but each call to allocateUnsafe still creates a new instance of Buffer/Uint8Array. The memory efficiency comes directly from Uint8Array/TypedArray in V8, where each instance of TypedArray has a bunch of pointers etc. Enabling pointerCompression in node would most likely reduce the impact of this, but that's not so easily done.

Here's some details I could find from a V8 developer: https://stackoverflow.com/questions/45803829/memory-overhead-of-typed-arrays-vs-strings/45808835#45808835

We can easily test this out by allocating some Buffers and checking the memoryUsage before/after

const SIZE = 1_000_000; // creating 1 million buffers

const tests = {
  allocUnsafe: () => {
    const arr = [];
    for (let i = 0; i < SIZE; i++) {
      arr.push(Buffer.allocUnsafe(12));
    }
    return arr;
  },
  emulatePoolUint8Array: () => {
    const buffer = new ArrayBuffer(SIZE * 12);
    const arr = [];
    for (let i = 0; i < SIZE; i++) {
      arr.push(new Uint8Array(buffer, i * 12, 12));
    }
    return arr;
  },
  createSingleBuffer: () => {
    return Buffer.allocUnsafe(SIZE * 12);
  },
};

Object.keys(tests).forEach((key) => {
  const bef = process.memoryUsage();
  const res = tests[key]();
  const aft = process.memoryUsage();
  console.log(`${key} used: ${Math.round((aft.rss + aft.external - (bef.rss + bef.external)) / 1024 / 1024)}MB`);
});

1000000 buffers used: 167MB 1000000 Uint8Array used: 140MB Single big buffer used: 11MB

Jul 24 '24 17:07 SeanReece

but each call to allocateUnsafe still creates a new instance of Buffer/Uint8Array

Right, Right, that is the center of the issue. Thanks for breaking that down.

While I was originally brainstorming this change I also played with the idea of having a large buffer to store a pool of objectIds, since the memory inefficiency comes from storing many small buffers.

Since we have isolated our interactions with buffers to the mini ByteUtils library maybe it is possible to manage our own pool as a single Uint8Array and keep an offset into it inside the ObjectId. I think we can clone the Node.js approach where ObjectIds will share references to this pool but eventually, they will go out of scope leaving old pools unreachable and GC-able.

Updated perf

bench	main	oid_639cc3c	percDiff
objectid_singleFieldDocument._serialize_bson	9.0120	8.5721	-5.0032
objectid_singleElementArray._serialize_bson	9.4766	10.5094	10.3357
objectid_array_10._serialize_bson	31.8439	30.9114	-2.9718
objectid_array_100._serialize_bson	206.7586	81.2777	-87.1285
objectid_array_1000._serialize_bson	222.2954	95.2045	-80.0572
objectid_singleFieldDocument._deserialize_bson	11.6742	11.9383	2.2366
objectid_singleElementArray._deserialize_bson	10.4276	10.9879	5.2325
objectid_array_10._deserialize_bson	53.2898	32.9102	-47.2845
objectid_array_100._deserialize_bson	108.2602	67.9801	-45.7104
objectid_array_1000._deserialize_bson	131.2962	77.6481	-51.3516

I've run an autocannon test against a server that calls toArray on 100,000 documents and JSON stringifies them. This is my attempt at "real worlding" the perf test.

autocannon http://localhost:8080 -d 10 -c 30 -w 3

MB/s throughput of a hyper-express server

main { avg: 24.2327552, median: 22.446079 }
oid  { avg: 19.296665609999998, median: 13.467647 }
percDiff { avg: '22.679%', median: '50.000%' }

Jul 25 '24 15:07 nbbeeken

they will go out of scope leaving old pools unreachable and GC-able.

@nbbeeken You're a genius 😄 Don't know why this never occurred to me.

Wow, I feel like a bit of idiot here 😄. For some reason, I assumed Node was maintaining a single large pool and just compacting/trimming old unused space. But, looking at the implementation it looks super simple, it just keeps a Uint8Array of size Buffer.poolSize and an offset. Once offset + size has reached the poolSize, a new pool is created and the old pool reference is dropped so it's theoretically available for GC.

I think we can replicate this easily, reduce memory usage even more, and keep our serialization/deserialization performance. Even if we completely throw out these string changes, I've learned a lot here and there are lots of non-string related changes that would improve performance.

Let me throw together a test to see if this will work. Here's what I'm thinking.

byte-utils will handle the pool creation and offset.
poolSize can be configurable. (default to ~1000 ObjectIds?)
Each ObjectId instance must hold a reference to its buffer pool and the OID offset
16 bytes for Number + pointer saves us ~24bytes per objectId over string implementation

Jul 25 '24 17:07 SeanReece

You're a genius 😄

tyty! But I must give credit to @addaleax for suggesting the idea of a Uint8Array pool. I have read Node's pool implementation before so I figured we could kindly steal that approach.

Even if we completely throw out these string changes, I've learned a lot here and there are lots of non-string related changes that would improve performance.

Yea! and I've learned a lot here too. I think it was not completely out of the question to begin with. I can imagine most use cases probably end up turning OIDs into hex strings anyway (JSON.stringifiy will do that), so the REST example could have shown a low impact to performance if it meant we just moved the hex-ify work "up" into BSON. Turns out it has too large of an impact, but there was potential there!

Let me throw together a test to see if this will work. Here's what I'm thinking.

Those sound like good jumping-off points. Seems like we intend to make an OID-specific pool, I'm not sure if it could be kept general easily 🤔 making it specific to OID means we don't have to return a byteLength along with byteOffset since it will always be 12. I think that's fine, we should focus on optimizing the specific case of OIDs given how common they are.

If you'd like to keep this PR as is for comparison, feel free to start a new one but continuing to work here is fine as well. This is really great work, thanks again for donating your time and talents! 💚

Jul 25 '24 17:07 nbbeeken

Closing in favour of https://github.com/mongodb/js-bson/pull/707

Thanks again for all your help @nbbeeken @H4ad. Looking forward to your input on the buffer pool

Aug 06 '24 19:08 SeanReece

js-bson js-bson copied to clipboard

perf(NODE-6246): Significantly improve memory usage and performance of ObjectId

Description

Why?

Performance Improvements

Real world use cases

Memory Improvements

Example

cacheBuffer Option

What is changing?

Is there new documentation needed for these changes?

What is the motivation for this change?

Release Highlight

Fill in title or leave empty for no highlight

Double check the following

js-bson
js-bson copied to clipboard