bsf
bsf copied to clipboard
Slow debug startUp
We already had the discussion here: https://discourse.bsframework.io/t/why-does-it-take-so-long-to-open-the-window/65/8
I had some time to dig a little bit deeper into the code and as I already mentioned, there are some inefficient code parts. You do always create a temporary stack for every iteration on mRTTIDerivedClasses
containers. I did some testings and were able to reduce the performance hit in those functions dramatically via a plain std::vector
. It becomes even better if you simply use recursion.
During the research I asked myself if it wouldn't be better to sort the mRTTIDerivedClasses
-containers to be able to perform binary instead of linear searches.
For the testing I added a SkipList-class implementation to the project and was finally able to startUp the engine in less than 11 sec (from previous over 35 sec). The disadvantage on this kind of data structure (under the hood it's simply a sorted std::vector
) is the performance hit during insertion, but it will be very efficient for searching.
Sure, in release mode the performance gains are less but it shouldn't hurt either.
So, my questions is, what's your opinion about that topic? Would you like to see the SkipList solution or do you already have other plans to reduce the startUp time?
Hi. I've already modified that code locally do use std::unordered_map
and fully gotten rid of the recursion, the map now holds all type entries for lookup.
I'm not sure how it compares with binary search, but I believe it should be much faster than the original approach in any case. If you can verify binary search approach is faster I can use that instead (although I'd prefer to use std::sort
and std::binary_search
on a std::vector
instead of a new container type).
Hey,
Thanks for your work. Indeed the new version is a lot faster than the previous one. I was able to start in less than 11 seconds. Locally I exchanged the type of mAllRTTITypes
with a SkipList and was able to gain a ~2 second speedup again.
Btw, internally the SkipList uses a std::vector
with std::sort
, std::lower_bound
, std::upper_bound
and so on. You can have a look at my implementation here: https://github.com/DNKpp/SkipList/blob/master/SkipList.hpp
Interesting, I wouldn't imagine it would be that much of a difference. I also submitted a refactor of the serializer that avoids a lot of dynamic allocations, which should help as well.
When I get another chance I'll look into it more. As I said before shaders that have a lot of variations are the main culprit and there might be ways around this. Some ideas (mostly writing this for myself later on):
- Shader variations could be streamed on-demand (they are already compiled on-demand, so the addition of streaming might not be too much work)
- Most variations share 90% of the code/meta-data. Instead of each variation storing it's own copy, with a lot of redundant data, we could store the meta-data in a central location and then add per-parameter flags determining which variation are they used to. Might result in a mess of a design though.
- Variations on render backends that support bytecode caching shouldn't have the need to store program source. This might not help much with load times, but some shaders can get pretty big on disk (even a few MB uncompressed in rare cases), which is too much.
- Data for variations for render backends different than the current one shouldn't be loaded at all.
Those might be overkill unless shader loading times also becomes an issue with an optimized build.