Mappings reading optimization
This PR introduces a sequential pattern for deserializing mappings. Instead of converting the whole stream to an NBT object at once then deserializing the nbt object, we have an approach of directly converting data from the streams to our mappings objects without any intermediate NBT object.
Previous Approach
Current Approach
Changelogs
- [x] Rework NBTLimiter class with better code design
- [x] NBTSerializer read functions now have package visibility
- [x] SequentialNBTReader implementation (reads data stream one at a time)
- [x] Mappings format v2
Issues
#857
I am always amazed how you, and Retrooper and Booky for that matter, come up with ideas and improvements like these. This looks really well done!
Nice work. Did someone run a profiler and compared both versions yet?
Nice work. Did someone run a profiler and compared both versions yet?
Not sure, but feel free to do so :-)
So I did a quick check and don't see that much of an improvement.
I tested it using a fresh 1.21.0 spigot server and build both PE jars and used the 2.4.1-SNAPSHOT version.
Additionally, I used intellijs profiler and with this run configuration:
I focused on the PacketEventsPlugin#onLoad method and its memory/cpu usage:
Before
CPU time = 1674ms
Memory allocations = 691.33 MB
After
CPU time = 1807ms
Memory allocations = 668.01 MB
Here are the full JFR snapshots generated: snapshots.zip
I just checked again with VisualVM, loading the .jfr snapshots and could find out that the heap size dropped.
So it seems like the peak memory usage is overall smaller:
Before
After
That's some great news! Most of the objects are dropped after a single read but Java GC doesn't clear it at the very moment. It would be interested to see how much overall memory gain this has!
That's some great news! Most of the objects are dropped after a single read but Java GC doesn't clear it at the very moment. It would be interested to see how much overall memory gain this has!
Overall they kinda use the same amount of memory, but the new versions seem to get rid of not needed stuff earlier, making the peak memory usage of that part smaller.
Before
After
Seems like readFully function for caching is the only bottleneck left. I could probably optimize that too...
Doesnt look like that much of a change tbh.
Second Version
We actually improved the NBTCompound caching part!
There are no more Hashmaps getting created in the memory while loading block mappings.
And the heap size also reduced by 20 MiB.
Thank you for this clever optimization @AbhigyaKrishna!