red icon indicating copy to clipboard operation
red copied to clipboard

Question on the behavior of parsing integers

Open ALANVF opened this issue 3 years ago • 2 comments

  1. to integer! "3.4e2" returns 3, despite the fact that it is formatted as a float

  2. to integer! "3e2" returns an error because of e, despite the fact that it's the same thing without a decimal point

  3. to integer! "3 foo" returns an error because of foo, despite the fact that it is separated from the 3

Are all 3 of these behaviors intentional? In R2 and R3:

  • # 1 returns 340 because the string is parsed as a float and then converted to an integer.
  • # 2 returns 300, because the string is parsed as a float (without a decimal point) and then converted to an integer
  • # 3 also errors

Because # 3 errors, this proves that the . intentionally stops the parsing of an integer in Red, or else 3.4e2 would also error since . is not a digit.

So now the question is: do we follow the behavior of R2/R3, stick with the current behavior, or try to clarify the current behavior by fixing/simplifying it?

ALANVF avatar May 25 '22 21:05 ALANVF

options:

  • consume just digits, error out if something else encountered
  • consume just digits, no errors if at least one found
  • consume digits and ' (as now), but no errors if at least one found
  • load just float/integer, convert loaded thing to integer
  • load any number, convert loaded thing to integer
  • load any content, convert loaded thing to integer

hiiamboris avatar May 25 '22 22:05 hiiamboris

to on integer!, float! and tuple! does not load the string, it does a very fast scanning to convert a literal integer in string representation to an integer value (other scalar types should be added in the future). If a smarter conversion is needed, the value can be first loaded:

>> 3e2
== 300.0
>> to-integer load "3.4e2"
== 340
>> to-integer load "3e2"
== 300
>> to-integer first load "3 foo"
== 3

In Rebol, load is always implied when to's second argument is a string, so the user has no option for a faster conversion for scalars than invoking the lexer engine.

In Red's old scanner, the performance difference was huge between the lexer and such specialized tokenizing routines. With the new one, I can't remember how big was the difference, I just remember testing it and concluding that it was worth keeping it. Though, feel free to recheck that.

I think we can stick to the "consume just digits, error out if something else encountered" rule.

dockimbel avatar May 27 '22 16:05 dockimbel

I think we can stick to the "consume just digits, error out if something else encountered" rule.

Implemented. Only ' character is now accepted (and skipped) by to-integer.

dockimbel avatar Aug 30 '22 16:08 dockimbel