msgpackr icon indicating copy to clipboard operation
msgpackr copied to clipboard

Fast opt-out of shared structures per object?

Open somebee opened this issue 2 years ago • 3 comments

We are packing tons of objects into a sequential stream. Most objects follow fairly consistent structure {id,name,...keys}, but within these objects there are several objects that I suspect does not benefit much from using structure definitions, like {someId:1,someOtherId:2} where many of them don't show up more than once among 10k+ objects.

I see the option shouldShareStructures but from reading the source I don't think it fits our need. I'm wondering if it might make sense to be able to pass in a function like shouldUseStructures(value) which can return false if we want to pack an object as a plain old regular object?

Our function would be as simple as (value)=>!!value.id. Any object without an id should skip all the logic testing for shared structures, key combinations and all that. I can definitely add it, but was wondering if you think it would make a difference at all? I will try to hardcode it in locally and do some very informal tests here :)

somebee avatar May 31 '23 05:05 somebee

Fwiw, skipping structures for these objects reduced the (compressed) size of the full stream from 500kb to 470kb, and the uncompressed by ~200kb. Unpacking performance seems about the same, but hard to say since it's so damn fast either way. Intuitively I would think that packing perf is faster as well, but haven't made an isolated case to test it.

somebee avatar May 31 '23 06:05 somebee

Made an isolated test with the real-world data we have.

useRecords(fn) – 1.35mb – pack: 10.2498ms unpack: 9.3448ms
useRecords – 1.46mb – pack: 11.4979ms unpack: 15.3999ms

So, in our usecase it makes a quite substantial difference actually. I'll submit a PR today where the only public-facing change is that you can supply useRecords as either a function or a boolean. If it is a function, it will essentially call useRecords(value) for each value, and opt out to writePlainObject if it returns false. If you set useRecords to true/false no code-paths will change, so there is no performance impact for any other cases.

somebee avatar May 31 '23 07:05 somebee

I assume it wouldn't make viable to use a Map for the objects that are... maps :) (without consistent structure).

kriszyp avatar May 31 '23 22:05 kriszyp