ryu Feature request: Returning number of characters processed in ryu

ryu_parse does not return the number of characters processed, so we are forced to tokenize the input before feeding it to s2d. s2d already knows when to stop consuming the input at the first non-number character, so there ends up being a duplication of effort and parsing a list of numbers is slower than it should be.

Both std::strtod and std::from_chars return a pointer to the first character not matching the number pattern. ryu_parse should do the same.

With this proposal, the INPUT_TOO_LONG status code would become obsolete.

Jan 11 '20 18:01 ecorm

I propose that ryu_parse provides a drop-in replacement for strtod with the same behavior except that it's locale-independant.

Jan 11 '20 20:01 ecorm

@expnkx thanks for suggesting your library, but I only need low-level parsing primitives for a library I'm working on. I want the number parsing dependency for my library to be as small as possible, with std::strtod as default for users who don't care about speed or switching locales.

Jan 12 '20 04:01 ecorm

@expnkx I agree the API for std::strtod is terrible. I'm more concerned about the penalty of pre-tokenizing the number than the actual API. Matching the std::strtod API would make it simpler to choose between the standard library and Ryu implementations at compile time using preprocessor directives.

Jan 12 '20 05:01 ecorm

@expnkx The purpose of this issue is to improve this project, not to promote your library.

Jan 12 '20 05:01 ecorm

@expnkx I don't mind your suggestions for a better API within the scope of this project. What I do mind is your insistance to use your library instead of this one.

If you think nobody should be using this library, then please go away - you're not helping. This is an issue tracker specific to this project, not your personal soap box.

Jan 12 '20 05:01 ecorm

@ecorm, I'm not sure I follow. s2d_n is supposed to consume the entire input. I can imagine an API that consumes as much as possible and leaves it to the caller to check if the entire input is consumed, although I think it's a bit risky if there's a mismatch between the format expected here and the format expected one level up.

Jan 15 '20 09:01 ulfjack

@ulfjack Consuming as much as possible is exactly what std::strtod, std::from_chars, and double_conversion::StringToDoubleConverter::StringToDouble do (up to a maximum specified by the caller), and they return the number of characters consumed. When parsing a number from a larger string (such as a JSON document), having ryu_parse return the number of characters consumed avoids the need to look ahead and determine the number boundary, which hurts the speed advantage of using ryu_parse in the first place.

If you think returning the number of characters consumed would somehow hurt the performance, or you want to avoiding breaking the existing API, then I'd propose extra parsing functions that do return the number of characters consumed.

It turns out after all that I need to find the number boundary anyway within my JSON document, so I longer need the number of bytes consumed. However, I suggest that you leave this feature request open for a while to see if others might be interested in it.

Jan 15 '20 22:01 ecorm

Hi, I tweaked the s2d and s2f implementation for my needs, which involves returning the number of characters processed. I also removed the status code, and replaced them by asserts, which makes these functions unsafe, but I'd be happy to create a PR which only returns the number of character processed if you're interested.

https://github.com/rlespinet/ryu/commit/52db0bb30e4b5c66738203db69780a8f04db9639

Jun 13 '20 08:06 rlespinet

ryu
ryu copied to clipboard

Feature request: Returning number of characters processed in ryu_parse

ryu ryu copied to clipboard

Feature request: Returning number of characters processed in ryu_parse

ryu
ryu copied to clipboard