fast_float icon indicating copy to clipboard operation
fast_float copied to clipboard

Improwing speed and reduce code size when fast_float is using as internal parser code.

Open IRainman opened this issue 9 months ago • 25 comments

🚀 Performance and Code Size Improvements

  • Optimized parsing logic for use as an internal lightweight parser (for example, inside other libraries).
  • Reduced binary size by conditionally excluding unneeded parsing features.
  • The from_chars_result_t structure is reduced to 4 bytes for better memory efficiency.
  • Improved the parsed_number_string_t layout and increased constexpr/consteval propagation to enable compile-time optimizations.
  • Reduced register pressure and branching in parsing hot paths.

⚙️ New Configuration Macros

Introduced new optional macros to minimize overhead when certain parsing features are not required:

  • FASTFLOAT_ONLY_POSITIVE_C_NUMBER_WO_INF_NAN Restricts parsing to positive C-style numbers only — no sign characters, no INF, INFINITY, or NaN and no any additional optional like skip white spaces or support for Fortran or JSON.

  • FASTFLOAT_ONLY_ROUNDS_TO_NEAREST_SUPPORTED Assumes that only the IEEE 754 “round-to-nearest” rounding mode is needed, removing crutches support code for other modes.

  • FASTFLOAT_ISNOT_CHECKED_BOUNDS Disables bounds checking for input ranges that are assumed to be valid. Use only when inputs are guaranteed safe — this reduces branching and slightly improves performance.

  • FASTFLOAT_ASSUME Provides a portable abstraction for the compiler’s [[assume]] intrinsic.

  • FASTFLOAT_HAS_BYTESWAP Uses std::byteswap if awailable to reduce code size..


🧩 Remove Deprecated Macros

  • Removed obsolete macroses FASTFLOAT_ALLOWS_LEADING_PLUS and FASTFLOAT_SKIP_WHITE_SPACE, which are now superseded by the stricter limited-mode options above.

🧠 Internal Refactoring and Quality Improvements

  • Simplified internal parsing logic with compile-time branching controlled by the new macros.
  • Enhanced constexpr coverage across the codebase for greater compile-time evaluation.
  • Added 32-bit build support and verified portability across compilers.
  • Updated benchmarks and tests to validate new restricted configurations.
  • Fixed minor compiler warnings and addressed PVS-Studio static analysis feedback.

📘 Documentation and Build

  • Updated README.md to describe all new configuration macros and their intended use cases.
  • Adjusted build scripts and benchmark configurations for the new compile-time flags.

Result: Smaller, faster, and more configurable builds — particularly suited for internal numeric parsers or embedded environments — while maintaining full compatibility and functionality in the default configuration.

IRainman avatar Mar 05 '25 20:03 IRainman

@IRainman

This both allow to significantly reduce code size and speedup your code when you use fast_float as a part of your parser.

I am skeptical. Your claims are not documented or quantified.

lemire avatar Mar 05 '25 20:03 lemire

@IRainman

This both allow to significantly reduce code size and speedup your code when you use fast_float as a part of your parser.

I am skeptical. Your claims are not documented or quantified.

OK, I'll add some additional tests, related to parsing only positive numbers and positive infinite. It's really a common case in any mathematical parser. Maybe I should take a more aggressive approach and disable infinity too. In theory, this defines should not be needed with constexpr if... Actually, one big question: do you plan to allow the compile time config options? Because current realisation parse options in fully software mode and this is bad.

IRainman avatar Mar 05 '25 20:03 IRainman

I recently changed some compile time config options into runtime config options. I believe that both worlds would be good. I would prefer if the compile time options to not be any ifdef-macros, but controlled by using template arguments.

dalle avatar Mar 05 '25 21:03 dalle

In this instance, the argument here is to reduce compiled size and improve runtime performance we avoid checking negatives ('-') and for 'nan'/'inf'. In my view, it is an extraordinary claim that checking for the negative ('-') and for 'nan'/'inf' makes a large difference.

It is possibly true, but I'd like to see the numbers. Please note that we have do have benchmarks included in the library.

Because current realisation parse options in fully software mode and this is bad.

How did you benchmark to arrive at the conclusion that it is bad?

I recently changed some compile time config options into runtime config options.

I would not assume that it makes a difference to the performance unless we have a hard data. Some parameters can be passed as template arguments, certainly...

But we need to take into account that parsing a single number takes hundreds of instructions. Predictable branches are not a big concern.

lemire avatar Mar 05 '25 22:03 lemire

I recently changed some compile time config options into runtime config options. I believe that both worlds would be good. I would prefer if the compile time options to not be any ifdef-macros, but controlled by using template arguments.

Very well! I'm currently reworking my patch to use templates argument and want to add more constexpr / consteval to the code.

IRainman avatar Mar 06 '25 12:03 IRainman

In this instance, the argument here is to reduce compiled size and improve runtime performance we avoid checking negatives ('-') and for 'nan'/'inf'. In my view, it is an extraordinary claim that checking for the negative ('-') and for 'nan'/'inf' makes a large difference.

It is possibly true, but I'd like to see the numbers. Please note that we have do have benchmarks included in the library.

I'm measuring performance on the mathematical parser, that already processes minus signs, and I yesterday rewrote it to also process inf / infinity for mathematical questions. In mathematical parser nan isn't possible at all because if it's nan the parser stops processing equations immediately.

Because current realisation parse options in fully software mode and this is bad.

How did you benchmark to arrive at the conclusion that it is bad?

Options it's not fully processed by the template and generate many additional lines of code. Also complete removing the minus sign for mantissa and inf/nan from fast_float significantly reduces generated assembler.

To check the performance of such a case, tests should contain only positive numbers in general notation.

IRainman avatar Mar 06 '25 13:03 IRainman

I recently changed some compile time config options into runtime config options. I believe that both worlds would be good. I would prefer if the compile time options to not be any ifdef-macros, but controlled by using template arguments.

Very well! I'm currently reworking my patch to use templates argument and want to add more constexpr / consteval to the code.

Please don't do this until we have the performance comparisons on the gains.

dalle avatar Mar 06 '25 13:03 dalle

I recently changed some compile time config options into runtime config options. I believe that both worlds would be good. I would prefer if the compile time options to not be any ifdef-macros, but controlled by using template arguments.

Very well! I'm currently reworking my patch to use templates argument and want to add more constexpr / consteval to the code.

Please don't do this until we have the performance comparisons on the gains.

Of course, I just cleaned my code of defines and replaced them with an option. Unfortunately, code isn't fully clean-up at compile time as a preprocessing macro.

IRainman avatar Mar 06 '25 15:03 IRainman

P. S. my current version of the test code is: [[assume(_view._Unchecked_end() - _view._Unchecked_begin() >= 1)]]; double val; constexpr auto options{fast_float::chars_format::general | fast_float::chars_format::no_infnan | fast_float::chars_format::disallow_leading_sign}; const auto res = fast_float::from_chars(_view._Unchecked_begin(), _view._Unchecked_end(), val, options); [[assume(res.ptr - _view._Unchecked_begin() >= 1)]]; const size_t n = res.ptr - _view._Unchecked_begin();

IRainman avatar Mar 06 '25 15:03 IRainman

I'm sorry to interrupt you, but I think you're going the wrong way here and focusing on the wrong stuff. I don't want to seem pompous or ignorant, and I don't want to offend you, I just don't want you to (possibly) waste more time on small changes.

I will say it clearly: Without proof of the gains there is very low chances that PR will be merged.

So please focus on the tests, make the performance/benchmarks to compile and run. Then you should run the performance/benchmarks workflows. If there are quantified improvements then the likelihood of merging this PR will increase substantially.

dalle avatar Mar 09 '25 00:03 dalle

I agree with @dalle.

It is fine to fork the library for your own purposes… it is open source. You can adapt it to your own needs. Please do so.

But we won’t take this PR unless we are convinced of its benefits. And you have not provided the evidence.

We are concerned about maintaining the code, keeping it correct and well tested.

Changes must be motivated.

lemire avatar Mar 09 '25 00:03 lemire

I added an improvement test with parser emulation.

P. S. In real code of mathematical processing engine, my patches also give significant improvements (this is from MSVC, not tested on the other compilers):

std::from_chars

Tests: exactly: 115, almost: 9, failed: 4, time is: 32064ms.

fast_float::from_chars

Tests: time is: 26684ms.

( commits from this PR )

Tests (100% of CPU cores is used by BOINC): time is: 43135ms. Tests (50% of CPU cores is used by BOINC): time is: 28898ms. Tests (BOINC is stopped): time is: 25459ms.

When switch FASTFLOAT_ONLY_POSITIVE_C_NUMBER_WO_INF_NAN is applied:

Tests: exactly: 115, almost: 9, failed: 4, time is: 25947ms.

( commits from this PR )

Tests (100% of CPU cores is used by BOINC): time is: 39588ms. Tests (50% of CPU cores is used by BOINC): time is: 27301ms. Tests (BOINC is stopped): time is: 24597ms.

IRainman avatar Mar 09 '25 17:03 IRainman

Additionally, you can also test the compilation flag FASTFLOAT_ONLY_POSITIVE_C_NUMBER_WO_INF_NAN with a file from here https://population.un.org/wpp/downloads?folder=Standard%20Projections&group=CSV%20format — this is a CSV with only positive numbers in there, for example this https://population.un.org/wpp/assets/Excel%20Files/1_Indicator%20(Standard)/CSV_FILES/WPP2024_TotalPopulationBySex.csv.gz

IRainman avatar Mar 09 '25 17:03 IRainman

@IRainman We already have a dataset made entirely of positive values (mesh), see https://github.com/fastfloat/fast_float/pull/308

If you prepare a file that contains just one number per line, you can then pass it to our benchmark like so...

cmake -B build -D FASTFLOAT_BENCHMARKS=ON
cmake --build build
./build/benchmarks/realbenchmark myfile.txt

(This assumes Linux/macOS... The instructions are similar under Visual Studio except that you need to specify that you work in release mode.)

lemire avatar Mar 09 '25 19:03 lemire

Hmm, I tried to update the brunch, and it's successfully updated, but now this PR isn't updated. What do I need to do? Close this and create a new PR after I finish changes, or may be some command for this situation exists?

IRainman avatar Mar 12 '25 14:03 IRainman

Hmm, I tried to update the brunch, and it's successfully updated, but now this PR isn't updated. What do I need to do?

I am not certain of what you mean. The content of a PR a determined by the content of a branch.

lemire avatar Mar 12 '25 16:03 lemire

I am not certain of what you mean. The content of a PR a determined by the content of a branch.

Now it's probably fine, thank you. I see my new commit in this PR after.

IRainman avatar Mar 12 '25 16:03 IRainman

Well, I don't know how to compile basictest.cpp locally, and currently I'm done because it's not any ability to compile tests locally on Windows. If you can upgrade the tests I will try again.

IRainman avatar Mar 12 '25 18:03 IRainman

Now I have added additional improvements, fixed a warning, and made tests in Linux work again.

IRainman avatar Mar 24 '25 10:03 IRainman

@IRainman We cannot merge this as-is. Please break down your PR into smaller contributions, starting from the less consequential.

lemire avatar Mar 25 '25 14:03 lemire

@lemire @dalle , can I compile all CI tests locally on the windows?

IRainman avatar Apr 12 '25 14:04 IRainman

1.5-2% in the base, and additional more than 10% with defines. image

IRainman avatar May 05 '25 18:05 IRainman

I'm probably done, you can review and do some speed tests. Now all tests are working. Can you please give me advice, what test I need to add for my changes?

IRainman avatar May 07 '25 19:05 IRainman

I make tests on the Linux machine:

original.txt pr307.txt pr307_with_flags.txt

image image

IRainman avatar Jun 14 '25 19:06 IRainman

@lemire please refer to the numbers in the comment above this.

IRainman avatar Jun 14 '25 20:06 IRainman