<format>: Improve performance
Reported by @jovibor - here are examples where <format> is 20% to 70% slower than libfmt:
https://github.com/microsoft/STL/issues/1800#issuecomment-812815620 https://github.com/microsoft/STL/issues/1800#issuecomment-812838596
The issue is probably caused by not preallocating any memory for the output string. I think reserving the size of the format string would be a good guess.
The issue is probably caused by not preallocating any memory for the output string
Shouldn't the strings from the benchmark fit into the Small String Buffer and not need allocation at all?
I think reserving the size of the format string would be a good guess.
At least for simple formaters, I think this is almost guaranteed to be too short {} is only two bytes. Integral numbers only fit in that for very small values, floating point will essentially always require more and if the argument is a string or some more complicated type just two chars will also almost never be enough.
Not that I have a good counterproposal, but usually, it is better to overallocate by a few bytes instread of allocating just not enough and having to reallocate.
Any idea what fmt does? Are you even allowed to look at the fmt source code?
The "interpolated strings" feature in C#, which is extremely similar to <format> is lowered by the compiler in terms of the runtime's implementation of String.Format(...), which in turn defers to FormatHelper, which speculatively allocates <format string length> + <number of args> * 8.
Maybe it's worth finding out whether this is an informed guess, the result of analyzing the runtime behaviour of existing codebases, or just 🎩✨🐇 ¯_(ツ)_/¯
Shouldn't the strings from the benchmark fit into the Small String Buffer and not need allocation at all?
They were wstrings which have half the Small String Buffer of string.
which speculatively allocates
<format string length> + <number of args> * 8.
Taking number of args into account might also be a good idea. Maybe we could even filter string(_view) arguments from args and take their lengths into account.
This is definitely related to the formatter string size.
When string is too long (avoiding SSO) the results for std::format become even worse, it's about two times slower than fmt.
The difference between wstring and string formatters is also significant, with the latter being faster.
Here the full test code with format_to and swprintf included.
I'm still a bit concerned about performance. After all the recent changes, the difference is as follows:

The std::format_to is almost on pair with fmt::format_to.
While std::format is more than 3 times slower than its fmt counterpart.
Microsoft Visual Studio 16.10.0 Preview 2.1 /O2 /utf-8
std::format version: 02ae70a
Can you share your benchmark code?
Can you share your benchmark code?
The link is one post above.
Thanks, after looking at it through a profiler, we are spending an absurd amount of time in _Fmt_iterator_buffer<back_insert_iterator<wstring>>::_Grow. If I replace the format_to benchmark to use a back_insert_iterator, I am seeing a similar amounts of time.
#1894 should resolve this
performance is really terrible .
fmt 8.1.0 Visual Studio 17.1 preview 3
#define _CRT_SECURE_NO_WARNINGS
#include <cstdio>
#include <format>
#include <benchmark/benchmark.h>
#include <fmt/format.h>
void std_sprintf(benchmark::State& s) {
for (auto _ : s) {
char buffer[12];
unsigned size = std::sprintf(buffer, "%d", 42);
}
}
BENCHMARK(std_sprintf);
void std_format(benchmark::State& s) {
for (auto _ : s) {
auto str = std::format("{}", 42);
}
}
BENCHMARK(std_format);
void std_format_2(benchmark::State& s) {
for (auto _ : s) {
auto str = std::format("{:d}", 42);
}
}
BENCHMARK(std_format_2);
void fmt_format(benchmark::State& s) {
for (auto _ : s)
auto str = fmt::format("{}", 42);
}
BENCHMARK(fmt_format);
void fmt_format_2(benchmark::State& s) {
for (auto _ : s)
auto str = fmt::format("{:d}", 42);
}
BENCHMARK(fmt_format_2);
void std_format_to(benchmark::State& s) {
std::string buffer;
buffer.reserve(12);
for (auto _ : s) {
auto str = std::format_to(std::back_inserter(buffer), "{}", 42);
}
}
BENCHMARK(std_format_to);
void std_format_to_2(benchmark::State& s) {
std::string buffer;
buffer.reserve(12);
for (auto _ : s) {
auto str = std::format_to(std::back_inserter(buffer), "{:d}", 42);
}
}
BENCHMARK(std_format_to_2);
void fmt_format_to(benchmark::State& s) {
std::string buffer;
buffer.reserve(12);
for (auto _ : s)
auto str = fmt::format_to(std::back_inserter(buffer), "{}", 42);
}
BENCHMARK(fmt_format_to);
void fmt_format_to_2(benchmark::State& s) {
std::string buffer;
buffer.reserve(12);
for (auto _ : s)
auto str = fmt::format_to(std::back_inserter(buffer), "{:d}", 42);
}
BENCHMARK(fmt_format_to_2);
BENCHMARK_MAIN();
Running bench.exe
Run on (4 X 2712 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x4)
L1 Instruction 32 KiB (x4)
L2 Unified 256 KiB (x4)
L3 Unified 6144 KiB (x1)
----------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------
std_sprintf 63.5 ns 56.2 ns 10000000
std_format 3475 ns 3223 ns 213333
std_format_2 7274 ns 6719 ns 100000
fmt_format 41.2 ns 39.3 ns 18666667
fmt_format_2 85.4 ns 83.7 ns 7466667
std_format_to 3398 ns 3209 ns 224000
std_format_to_2 7051 ns 6417 ns 112000
fmt_format_to 34.8 ns 33.7 ns 19478261
fmt_format_to_2 62.6 ns 60.9 ns 10000000
@crackedmind Would you mind benchmarking wide-string counterparts (L"") as well?
Also, it was mentioned previously to compile with /utf-8 compiler flag, to improve performance.
/utf-8 makes an ENORMOUS difference, we're working on spreading that effect to other encodings.
Benchmark code:
#define _CRT_SECURE_NO_WARNINGS
#include <cstdio>
#include <format>
#include <benchmark/benchmark.h>
#include <fmt/format.h>
#include <fmt/xchar.h>
void std_sprintf(benchmark::State& s) {
for (auto _ : s) {
char buffer[12];
unsigned size = std::sprintf(buffer, "%d", 42);
}
}
BENCHMARK(std_sprintf);
void std_format(benchmark::State& s) {
for (auto _ : s) {
auto str = std::format("{}", 42);
}
}
BENCHMARK(std_format);
void std_format_2(benchmark::State& s) {
for (auto _ : s) {
auto str = std::format("{:d}", 42);
}
}
BENCHMARK(std_format_2);
void fmt_format(benchmark::State& s) {
for (auto _ : s)
auto str = fmt::format("{}", 42);
}
BENCHMARK(fmt_format);
void fmt_format_2(benchmark::State& s) {
for (auto _ : s)
auto str = fmt::format("{:d}", 42);
}
BENCHMARK(fmt_format_2);
void std_format_to(benchmark::State& s) {
std::string buffer;
buffer.reserve(12);
for (auto _ : s) {
auto str = std::format_to(std::back_inserter(buffer), "{}", 42);
}
}
BENCHMARK(std_format_to);
void std_format_to_2(benchmark::State& s) {
std::string buffer;
buffer.reserve(12);
for (auto _ : s) {
auto str = std::format_to(std::back_inserter(buffer), "{:d}", 42);
}
}
BENCHMARK(std_format_to_2);
void fmt_format_to(benchmark::State& s) {
std::string buffer;
buffer.reserve(12);
for (auto _ : s)
auto str = fmt::format_to(std::back_inserter(buffer), "{}", 42);
}
BENCHMARK(fmt_format_to);
void fmt_format_to_2(benchmark::State& s) {
std::string buffer;
buffer.reserve(12);
for (auto _ : s)
auto str = fmt::format_to(std::back_inserter(buffer), "{:d}", 42);
}
BENCHMARK(fmt_format_to_2);
void std_swprintf(benchmark::State& s) {
for (auto _ : s) {
wchar_t buffer[12];
unsigned size = std::swprintf(buffer, L"%d", 42);
}
}
BENCHMARK(std_swprintf);
void std_wformat(benchmark::State& s) {
for (auto _ : s) {
auto str = std::format(L"{}", 42);
}
}
BENCHMARK(std_wformat);
void std_wformat_2(benchmark::State& s) {
for (auto _ : s) {
auto str = std::format(L"{:d}", 42);
}
}
BENCHMARK(std_wformat_2);
void fmt_wformat(benchmark::State& s) {
for (auto _ : s)
auto str = fmt::format(L"{}", 42);
}
BENCHMARK(fmt_wformat);
void fmt_wformat_2(benchmark::State& s) {
for (auto _ : s)
auto str = fmt::format(L"{:d}", 42);
}
BENCHMARK(fmt_wformat_2);
void std_wformat_to(benchmark::State& s) {
std::wstring buffer;
buffer.reserve(12);
for (auto _ : s) {
auto str = std::format_to(std::back_inserter(buffer), L"{}", 42);
}
}
BENCHMARK(std_wformat_to);
void std_wformat_to_2(benchmark::State& s) {
std::wstring buffer;
buffer.reserve(12);
for (auto _ : s) {
auto str = std::format_to(std::back_inserter(buffer), L"{:d}", 42);
}
}
BENCHMARK(std_wformat_to_2);
void fmt_wformat_to(benchmark::State& s) {
std::wstring buffer;
buffer.reserve(12);
for (auto _ : s)
auto str = fmt::format_to(std::back_inserter(buffer), L"{}", 42);
}
BENCHMARK(fmt_wformat_to);
void fmt_wformat_to_2(benchmark::State& s) {
std::wstring buffer;
buffer.reserve(12);
for (auto _ : s)
auto str = fmt::format_to(std::back_inserter(buffer), L"{:d}", 42);
}
BENCHMARK(fmt_wformat_to_2);
BENCHMARK_MAIN();
Benchmark results:
C:\Users\Casey\OneDrive\Desktop>cl /nologo /std:c++latest /EHsc /I c:\STL\stl\inc c:\Users\Casey\OneDrive\Desktop\repro.cpp /I c:\Users\Casey\Repos\vcpkg\installed\x64-windows\include /MD /O2 /Ob3 /utf-8 /link /LIBPATH:c:\Users\Casey\Repos\vcpkg\installed\x64-windows\lib benchmark.lib fmt.lib shlwapi.lib
repro.cpp
C:\Users\Casey\OneDrive\Desktop>repro
2022-01-23T17:36:52-08:00
Running repro
Run on (32 X 3493 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x16)
L1 Instruction 32 KiB (x16)
L2 Unified 512 KiB (x16)
L3 Unified 16384 KiB (x4)
-----------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------
std_sprintf 53.1 ns 53.1 ns 10000000
std_format 42.0 ns 42.4 ns 16592593
std_format_2 83.4 ns 83.7 ns 7466667
fmt_format 30.9 ns 30.7 ns 22400000
fmt_format_2 72.7 ns 73.2 ns 8960000
std_format_to 43.3 ns 43.0 ns 16000000
std_format_to_2 83.7 ns 83.7 ns 8960000
fmt_format_to 25.9 ns 25.5 ns 26352941
fmt_format_to_2 52.4 ns 53.1 ns 10000000
std_swprintf 38.9 ns 39.3 ns 18666667
std_wformat 127 ns 126 ns 5600000
std_wformat_2 150 ns 151 ns 4977778
fmt_wformat 33.2 ns 33.0 ns 20363636
fmt_wformat_2 58.5 ns 58.6 ns 11200000
std_wformat_to 65.6 ns 65.6 ns 11200000
std_wformat_to_2 83.6 ns 83.7 ns 8960000
fmt_wformat_to 37.4 ns 36.8 ns 18666667
fmt_wformat_to_2 64.2 ns 64.2 ns 11200000
Looks like format(L"") needs some other perf work as well - fmt and swprintf are crushing us.