minutor
minutor copied to clipboard
Clearing chunk cache takes ages
As of commit 4ff205387562afc44f92bedf29fecea1788a7b5f, clearing the chunk cache (i. e. refreshing or reloading the map) takes an awful lot of time - long minutes (!) even in a Release build. It seems that most of the time is spent deleting the per-section palettes, mostly the properties member.
Tested on a self-compiled Win64 binary built with Qt 5.15.1.
I experienced this behavior for a much longer time. But only for Debug builds!
The mentioned commit only removes Ubuntu 16.04 support from the build chain. (No code change.) So the problem is older and as mentioned probably in the Chunk destructor.
I did some digging around and finally found out the culprit is not Minutor itself, but the QtCreator's debugger. When running under the debugger, the QMap and QString operations take about 5 - 60 times longer than when running in a standalone executable. For me, this makes even Release builds barely usable under QtCreator, reloading after displaying just a screenful of a world takes several minutes.
I even made a simple test project to measure the QMap<QString> performance against std::map<QString>, std::map<std::string>, QMap<std::string>, std::undordered_map<QString> and std::unordered_map<std::string>. The test fills 50k instances of map with 30 items (9-char key, 9-char value) each. The results below are from a MSVC 2019 Release build on Win8.1, standalone executable:
QMap-QString : Filling the containers took 686 msec | Clearing took 220 msec
StdMap-QString : Filling the containers took 507 msec | Clearing took 211 msec
StdMap-StdString : Filling the containers took 373 msec | Clearing took 186 msec
QMap-StdString : Filling the containers took 493 msec | Clearing took 139 msec
StdUnorderedMap-QString : Filling the containers took 447 msec | Clearing took 190 msec
StdUnorderedMap-StdString: Filling the containers took 336 msec | Clearing took 147 msec
I assume the biggest difference between std::string and QString is that the latter is in UTF-16, so there's some conversion overhead when creating the strings, as well as comparison going through OS's wide character handling routines DLL call instead of direct in-code comparison. So if Minutor was to change to a std::unordered_map<std::string, QVariant> storage for the per-chunk-section block palette, it would run twice as fast.
For comparison, the very same executable, when run under QtCreator's debugger, produces these results:
QMap-QString : Filling the containers took 3072 msec | Clearing took 13467 msec
StdMap-QString : Filling the containers took 38562 msec |Clearing took 6462 msec
StdMap-StdString : Filling the containers took 21542 msec |Clearing took 5902 msec
QMap-StdString : Filling the containers took 11225 msec | Clearing took 9416 msec
StdUnorderedMap-QString : Filling the containers took 18860 mse c| Clearing took 8279 msec
StdUnorderedMap-StdString: Filling the containers took 22438 msec | Clearing took 7316 msec
Here if we compare the std::unordered_map<std::string, QVariant> to the current code, it is actually much slower on the "load" side and not that much faster on the "unload" part.
I think this issue can be closed, with "won't fix, have workaround".
Is that still the case under Qt6?
In case of Palette entries, we know that Mojang will always use english names. Therefore these strings can always be represented with 7bit ASCII (or UTF-8 to have some margin). So we definitely do not need UTF-16 support there.
I will think about it during August . . . Could you share your test project, so that we can repeat the measurement on different platforms?
I haven't tested this with anything other than Qt 5.15.1 on Windows. I expect it to be Windows-specific and QtCreator-specific (rather than Qt-specific).
My thinking was that Minutor could instead use a global palette (number -> complete block identification) and have each chunk section provide a number -> number map into it. Then we could even use std::string_view while loading the palette, since the palette data is only needed while loading and it's already in the NBT's memory. This could provide even more speedup. But that's the long-term future, I suppose :)
Here's the test project, feel free to experiment with it and let me know if you find anything interesting: https://github.com/madmaxoft/StringMapSpeedTest Note that it's a poor quality code, jsut as proof-of-concept, so don't judge me :)
? We are exactly doing that already ?
There is ONE global definition with all Blocks having a number (hash of ID name). While loading the Section specific Palette, the strings are translated into numbers and stored in an array for each possible Block position (16x16x16 numbers per Section). The very first flatting approach was using strings to access the global palette. But that was the first thing I changed for performance gain.
I don't think that's what Minutor does, at least in the 1.13+ branch of the loading code. The section's local palette seems to stay allocated until the entire Chunk is deleted:
Loading: https://github.com/mrkite/minutor/blob/master/chunk.cpp#L276
Freeing: https://github.com/mrkite/minutor/blob/master/chunk.cpp#L28-L34
Maybe we mean the same, but use different words?
Per Section the Palette with string IDs is loaded and converted into a HID (hashed ID). These HIDs are later used to get Block information from the global BlockIdentifier. That is what I mean with "number -> BlockInfo"
Probably the string representation of the Palette stays (unused) in memory and some more ugly stuff.
When we need BlockInfo for rendering, we access it by numbers. When we generate the ToolTip in the status bar, we use the string representation.
Here are my results (I put both values of a test into one line for better comparison):
[Release Build]
QMap-QString : Filling took 457 msec | Clearing took 208 msec
StdMap-QString : Filling took 383 msec | Clearing took 225 msec
StdMap-StdString : Filling took 286 msec | Clearing took 168 msec
QMap-StdString : Filling took 341 msec | Clearing took 144 msec
StdUnorderedMap-StdString : Filling took 262 msec | Clearing took 150 msec
[Debug Build]
QMap-QString : Filling took 4808 msec | Clearing took 1004 msec
StdMap-QString : Filling took 5449 msec | Clearing took 1108 msec
StdMap-StdString : Filling took 4243 msec | Clearing took 1184 msec
QMap-StdString : Filling took 3877 msec | Clearing took 1101 msec
StdUnorderedMap-StdString : Filling took 3840 msec | Clearing took 1160 msec
I do not have relevant difference if I start the executable from a console window or from QtCreator.
AMD Ryzen 5 3600, Windows 10, QtCreator 4.12.3, Qt 5.12.10, MSVC 2019 It seems, that std::whatever is slightly faster. I would be interested in more results form other platforms, compilers and CPU architectures.
Access times too quick to accurately measure?
Yes, the values are different each run. But not that bad to draw a general conclusion.
Before changing our data types I would do more benchmarking. This benchmark is very limited, as it uses only 30 items. We have to repeat for each location with more reasonable numbers.
std::map is claimed to be O( log(N) ) std::unordered_map is claimed to be O( 1 )
Therefore the unordered_map should always be faster when the map is large enough.
I think another point to consider is character encoding - QString uses UTF-16 internally, while std::string uses whatever we push into it, so it can have the native NBT encoding (UTF-8?) So std::string should use less memory and be faster to read since there's no encoding conversion.
I'm already convinced to change data types that hold NBT data to std::string.
But also there, I plan some benchmarking (after 1.18 updates are working).