STL icon indicating copy to clipboard operation
STL copied to clipboard

std::format with std::chrono poor performance

Open nuuSolutions opened this issue 2 years ago • 6 comments

Is this a known issue or am I doing something sub-optimal?

I'm simply calling

std::format_to( std::back_inserter( buffer ), "[{:%H:%M:%S}] ", time );

where buffer is a std::string and time is of type std::chrono::system_clock::duration and this is about 4-5 times slower than 'rolling my own' e.g.

static constexpr auto h = 3600'000'000'000LL;
static constexpr auto min = 60'000'000'000LL;
static constexpr double sec = 1'000'000'000;
auto elapsed_ns = time.count();
const auto hours = elapsed_ns / h; elapsed_ns -= hours * h;
const auto minutes = elapsed_ns / min; elapsed_ns -= minutes * min;
const double seconds = elapsed_ns / sec;

std::format_to( std::back_inserter( buffer ), "[{}:{:02}:{:010.7f}] ", hours, minutes, seconds );

Not sure where this huge difference can come from. Allocations?

nuuSolutions avatar Mar 15 '23 17:03 nuuSolutions

Thanks for the report - this is not a known issue so we'll need to investigate.

StephanTLavavej avatar Mar 15 '23 22:03 StephanTLavavej

The chrono formatter parses the entire spec into a vector and then routes all output through an ostringstream and in many cases put_time. That seems...unlikely to be performant.

timsong-cpp avatar Mar 16 '23 00:03 timsong-cpp

@StephanTLavavej As I promised from my previous PRs I did started work to see how time formatting can be optimized in Microsoft STL. I will again explain the main bottleneck of the routine to have more context here.

Basically this is slow, because the UCRT strftime() function under the hood can currently only work with wchar_t strings. Of course this results in several heavy operations. The steps which strftime() is doing are as follows:

  • Allocate same size(not in bytes, but in characters) of wchar_t buffer as the input format string.
  • Convert the input format string to wchar_t using the allocated buffer from the previous step.
  • Allocates wchar_t buffer for the result, based on the passed maxsize of the original char* output buffer.
  • Processes all specifiers and writes the output to the allocated wchar_t buffer from the step above.
  • Converts back the wchar_t buffer to char* storing it in the original output buffer.
  • Free both wchar_t buffers which were used for handing the format and result strings.

Of course if the output buffer size is not large enough, the function will stop and return zero, but all the allocation and conversion steps will still remain.

Now, if we had all these steps run once for the whole format string, perhaps it wouldn't be that bad in terms of performance. However the C++ specification of std::time_put<CharT,OutputIt>::put and std::time_put<CharT,OutputIt>::do_put requires each valid format specifier to be processed separately, thus a simple format string as "%H:%M:%S" will cause strftime() to be called 3 times in a row, doing all the heap allocations and string conversions. For more complex format strings the performance hit is N times worse and really noticeable.

My original idea was to sanitize the format buffer to escape characters that would crash UCRT and pass the whole thing directly to strftime(), effectively doing the heavy operations just once. However, this is not possible as it will basically violate the C++ standard requirements which I've already explained.

At this point I'm open to suggestions.

P.S. My best bet would still be to convince the UCRT team to fixup the implementation of strftime() to:

  • Not do any conversion stuff.
  • On invalid specifiers do the same thing as other major implementations so we can avoid a lot of additional processing and complexity on the STL side.

TheStormN avatar Sep 20 '24 22:09 TheStormN

BTW, it's possibly optimizing to simplify operator<< overloads by not using format. #2003 is related.

frederick-vs-ja avatar Sep 26 '24 08:09 frederick-vs-ja

Even if we optimize operator<< it still ends up calling strftime() the way I've explained. Also this issue is about runtime performance, not compilation times.

TheStormN avatar Sep 26 '24 16:09 TheStormN

Even if we optimize operator<< it still ends up calling strftime() the way I've explained. Also this issue is about runtime performance, not compilation times.

Did you really mean that we need to call strftime() (or its friends) even if the format string is known to be "%H:%M:%S"?

I think we can avoid strftime() call for it calls because only the decimal point need to be localized during output, and we only need to call decimal_point() once. Also, it can save runtime parsing of statically determined format string by avoid using format.

For operator<<, IIUC only we only need to call strftime() or its friends when outputting month and weekday (due to "{:L%b}"/"{:L%a}").

frederick-vs-ja avatar Sep 26 '24 23:09 frederick-vs-ja

@frederick-vs-ja You are right, but what really happens is that strftime() is everywhere. :)

TheStormN avatar Oct 10 '24 12:10 TheStormN