ifcplusplus icon indicating copy to clipboard operation
ifcplusplus copied to clipboard

Don't recreate stream and locale in loop

Open Osyotr opened this issue 1 year ago • 2 comments

Using std::locale::classic() instead of creating a new std::locale object on each iteration of the loop gives a massive performance boost. ~~Moving std::stringstream out of the loop further improves performance.~~

These are results from my benchmark:

Run on (20 X 2112 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x10)
  L1 Instruction 32 KiB (x10)
  L2 Unified 1280 KiB (x10)
  L3 Unified 25600 KiB (x1)
----------------------------------------------------------------------
Benchmark                            Time             CPU   Iterations
----------------------------------------------------------------------
NewLocaleEveryIteration          15269 ns        15346 ns        44800
LocaleClassicEveryIteration        560 ns          558 ns      1120000
StreamClearStr                     379 ns          377 ns      1866667
Format                            37.0 ns         36.8 ns     18666667
fmt_lib                           58.4 ns         58.6 ns     11200000
to_chars                          12.8 ns         12.8 ns     56000000

Osyotr avatar Sep 12 '24 23:09 Osyotr

Hmm, doesn't it crash in the parallel for loop? Did you test it in release mode?

Then the loop runs in parallel threads, each using the same tempStream instance: #define FOR_EACH_LOOP std::for_each( std::execution::par,

It could be optimized such that there is a vector<std::stringstream> tempStreamPerThread with int numThreads = std::thread::hardware_concurrency();

so that each thread re-uses the same std::stringstream

ifcapps avatar Sep 13 '24 09:09 ifcapps

You are correct, this is a race condition in release builds. I've moved it back into the loop.

Osyotr avatar Sep 13 '24 10:09 Osyotr