Don't recreate stream and locale in loop
Using std::locale::classic() instead of creating a new std::locale object on each iteration of the loop gives a massive performance boost. ~~Moving std::stringstream out of the loop further improves performance.~~
These are results from my benchmark:
Run on (20 X 2112 MHz CPU s)
CPU Caches:
L1 Data 48 KiB (x10)
L1 Instruction 32 KiB (x10)
L2 Unified 1280 KiB (x10)
L3 Unified 25600 KiB (x1)
----------------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------------
NewLocaleEveryIteration 15269 ns 15346 ns 44800
LocaleClassicEveryIteration 560 ns 558 ns 1120000
StreamClearStr 379 ns 377 ns 1866667
Format 37.0 ns 36.8 ns 18666667
fmt_lib 58.4 ns 58.6 ns 11200000
to_chars 12.8 ns 12.8 ns 56000000
Hmm, doesn't it crash in the parallel for loop? Did you test it in release mode?
Then the loop runs in parallel threads, each using the same tempStream instance: #define FOR_EACH_LOOP std::for_each( std::execution::par,
It could be optimized such that there is a
vector<std::stringstream> tempStreamPerThread with
int numThreads = std::thread::hardware_concurrency();
so that each thread re-uses the same std::stringstream
You are correct, this is a race condition in release builds. I've moved it back into the loop.