rebellion icon indicating copy to clipboard operation
rebellion copied to clipboard

Regular expression types?

Open jackfirth opened this issue 5 years ago • 6 comments

Something like this:

(define-regular-expression-type memory-usage-row
  #px"\\s*(\\S+):\\s+(\\d+)\\s+(\\d+)"
  (usage-type current-usage cumulative-usage))

(define memory-dump
#<<END
       <variable-code>:       3307     105824
    <application-code>:      14761    1007952
  <unary-application-c:      18140     580480
  <binary-application-:      19174     766960
END
  )

> (match-memory-usage-row memory-dump)
(memory-usage-row "<variable-code>" 3307 105824)

> (sequence->list (in-matched-memory-usage-rows memory-dump))
(list (memory-usage-row "<variable-code>" 3307 105824)
      (memory-usage-row "<application-code>" 14761 1007952)
      (memory-usage-row "<unary-application-c" 18140 580480)
      (memory-usage-row "<binary-application-" 19174 766960))

Use case came up when I was trying to parse some output from (dump-memory-stats).

jackfirth avatar Nov 15 '19 09:11 jackfirth

Note: needs to be smart enough to parse (\\d+) into a number? instead of a string of digits.

jackfirth avatar Nov 15 '19 09:11 jackfirth

I like this idea, but having to hack apart regular expressions is annoying. It would be easier if there was already a SRE like layer and if Racket regex supported named subpatterns.

To handle numbers (and other cases) perhaps to modify your initial example to

(define-regular-expression-type memory-usage-row
  #px"\\s*(\\S+):\\s+(\\d+)\\s+(\\d+)"
  (usage-type [current-usage string->number] [cumulative-usage string->number]))

samdphillips avatar Nov 15 '19 16:11 samdphillips

Being able to generate a record (from rebellion/collection/record) would be nice as well.

samdphillips avatar Nov 15 '19 19:11 samdphillips

API needs to define the behavior when a pattern match fails. Ideas:

  • error
  • use present and absent
  • return #f (or #f element for sequence generating)
  • use a failure thunk similar to hash-ref and friends

samdphillips avatar Nov 16 '19 01:11 samdphillips

On failure, would you want information about why the match failed? Maybe using result objects instead of present and absent would be the way to go.

jackfirth avatar Nov 16 '19 07:11 jackfirth

I think using a result is also a valid choice on it's face, but the Racket (and most other) regex engines provide mostly useless failure information beyond "the match failed".

Although I could see a case where the failure branch could carry what failed to match in the error. Which could avoid a lot of threading acrobatics. Example:

(define-regular-expression-type
  the-stuff-i-want <pat> <fields>)

(define (log-failures a-result)
  (result-case 
    a-result
    #:success (lambda (v) #t)
    #:failure (lambda (e) (log e) #f)))

(transduce (in-lines data-in)
           (mapping match-the-stuff-i-want)
           (filtering log-failures)
           ...)

samdphillips avatar Nov 16 '19 18:11 samdphillips