crystal icon indicating copy to clipboard operation
crystal copied to clipboard

RFC: E notation BigDecimal parser

Open stevegeek opened this issue 4 years ago • 5 comments

Related to : #9580

This implements a first pass at this (not stack based parser, to avoid needing a stack, just a translation of the parse table into logic)

PR which is still a work in progress and would benefit from your inputs should we wish to progress with this

created with the productions below (syntax below works in http://hackingoff.com/compilers/ll-1-parser-generator)

Scientific -> OptionalSign Number
OptionalSign -> s |
Number -> Digits OptionalFractional OptionalExponent | Fractional OptionalExponent
OptionalFractional -> '.' OptionalDigits  
Fractional -> '.' Digits
OptionalExponent -> e OptionalSign Digits | 
OptionalDigits -> Digits |  
Digits -> Digit RepeatDigit
RepeatDigit -> Digit |  
Digit -> n 

stevegeek avatar Jul 06 '20 15:07 stevegeek

Out of curiosity, what are perf gains?

Sija avatar Jul 07 '20 01:07 Sija

@sija things are slower at the moment:

require "benchmark"

Benchmark.ips do |x|
  x.report("ex1 - long") do
    BigDecimal.new("+0829756293456064502345702345.283457863284759365e-7364756823")
  end

  x.report("ex2 - long error") do
    BigDecimal.new("138479783434.927347264629873647269348762e82347978e")
  rescue
    #
  end

  x.report("ex3 - short") do
    BigDecimal.new("1.1")
  end

  x.report("ex3 - short error") do
    BigDecimal.new("1.1-")
  rescue
    #
  end
end

On current master:

MRc:crystal stephen$ ./bin/crystal run --release bdp.cr 

       ex1 - long 781.23k (  1.28µs) (± 0.90%)  672B/op   4.19× slower
 ex2 - long error 434.36k (  2.30µs) (± 1.56%)  785B/op   7.54× slower
      ex3 - short   3.27M (305.53ns) (± 1.15%)  464B/op        fastest
ex3 - short error 578.83k (  1.73µs) (± 1.59%)  464B/op   5.65× slower

this:

MRc:crystal stephen$ bin/crystal run --release bdp.cr
       ex1 - long 398.11k (  2.51µs) (± 1.12%)   1.2kB/op   4.99× slower
 ex2 - long error 275.56k (  3.63µs) (± 1.20%)  1.27kB/op   7.20× slower
      ex3 - short   1.99M (503.74ns) (± 1.52%)    784B/op        fastest
ex3 - short error 405.12k (  2.47µs) (± 1.45%)  1.04kB/op   4.90× slower

stevegeek avatar Jul 07 '20 14:07 stevegeek

Apologies for the long delay on this... I will review if still relevant and fix up PR if needed (else close)

stevegeek avatar Mar 11 '22 08:03 stevegeek

Are there cases that this new parser can handle and the original parser can't?

lbguilherme avatar Mar 11 '22 10:03 lbguilherme

@lbguilherme The original motivation was the current implementation seems somewhat ad-hoc and contained a number of parse bugs (see #9547) mainly around invalid notation. I fixed all the issues I could find in #9577 but thought maybe a refactored parser would be better. This one was created from a stricter grammar for E notation so potentially could lead to a more robust parser of the notation.

I see however that since this was created, #10641 has been discussed, so maybe these changes are not relevant anymore in the bigger picture of a rewrite of BigDecimal.

stevegeek avatar Mar 11 '22 13:03 stevegeek