cl-str icon indicating copy to clipboard operation
cl-str copied to clipboard

Idea: add string match for easily handling sub-string. [merged, looking for feedback]

Open ccqpein opened this issue 2 years ago • 10 comments

I was thinking if the cl-str can "pattern match" the string like some other languages' match case.

So I write my own version (like example below), what do you guys think? Is that fit the cl-str's purpose? I checked the doc and there is a string-case. Should I change the name of the macro? Thanks!

(defun expand-match-branch (str block patterns forms)
  (case patterns
    ((t 'otherwise) `(progn ,@forms))
    (t (loop with regex = '("^")
            and vars = '()
            for x in patterns
            do (cond ((stringp x)
                      (push x regex))
                     ((symbolp x)
                      (push "(.*)" regex)
                      (push x vars))
                     (t (error "only symbol and string allowed in patterns")))
            finally (push "$" regex)
            finally (return (let ((whole-str (gensym))
                                  (regs (gensym)))
                              `(multiple-value-bind (,whole-str ,regs)
                                   (cl-ppcre:scan-to-strings
                                    ,(apply #'str:concat (reverse regex))
                                    ,str)
                                 (declare (ignore ,whole-str))
                                 (when ,regs
                                   (let ,(reverse vars)
                                     ,@(loop for ind from 0 below (length vars)
                                             collect `(setf ,(nth ind (reverse vars))
                                                            (elt ,regs ,ind)))
                                     (return-from ,block
                                       (progn ,@forms)))))))))))

(defmacro str-match (str &rest match-branches)
  (let ((block-sym (gensym)))
    `(block ,block-sym
       ,@(loop for statement in match-branches
               collect (expand-match-branch
                        str
                        block-sym
                        (nth 0 statement)
                        (cdr statement))))))
CL-USER> (macroexpand-1 '(str-match sss
                     (("a" b "c") (parse-integer b))
                     (("a" x "c" y "b") (print (parse-integer x)) (print (parse-integer y)) (list (parse-integer x) (parse-integer y)))
                     (t (print "aa"))
                     ))
(BLOCK #:G415
  (MULTIPLE-VALUE-BIND (#:G416 #:G417)
      (CL-PPCRE:SCAN-TO-STRINGS "^a(.*)c$" SSS)
    (DECLARE (IGNORE #:G416))
    (WHEN #:G417
      (LET (B)
        (SETF B (ELT #:G417 0))
        (RETURN-FROM #:G415 (PROGN (PARSE-INTEGER B))))))
  (MULTIPLE-VALUE-BIND (#:G418 #:G419)
      (CL-PPCRE:SCAN-TO-STRINGS "^a(.*)c(.*)b$" SSS)
    (DECLARE (IGNORE #:G418))
    (WHEN #:G419
      (LET (X Y)
        (SETF X (ELT #:G419 0))
        (SETF Y (ELT #:G419 1))
        (RETURN-FROM #:G415
          (PROGN
           (PRINT (PARSE-INTEGER X))
           (PRINT (PARSE-INTEGER Y))
           (LIST (PARSE-INTEGER X) (PARSE-INTEGER Y)))))))
  (PROGN (PRINT "aa")))
T
CL-USER> (str-match "a1c5b"
(("a" b "c") (parse-integer b))
(("a" x "c" y "b") (print (parse-integer x)) (print (parse-integer y)) (list (parse-integer x) (parse-integer y)))
(t (print "aa"))
)

1 
5 
(1 5)

ccqpein avatar Dec 29 '23 19:12 ccqpein

Nice, that is pretty interesting.

With some indentation the snippet becomes


(str-match "a1c5b"
           (("a" b "c")
            (parse-integer b))
           (("a" x "c" y "b")
            (print (parse-integer x))
            (print (parse-integer y))
            (list (parse-integer x) (parse-integer y)))
           (t (print "aa")))

so by using &body instead of &rest we get this indentation:


(str-match "a1c5b"
  (("a" b "c")
   (parse-integer b))
  (("a" x "c" y "b")
   (print (parse-integer x))
   (print (parse-integer y))
   (list (parse-integer x) (parse-integer y)))
  (t (print "aa")))

Would you not use the Trivia library for pattern matching? It probably does this, and more.

What are users going to ask for pattern matching features after we introduce this one?

like some other languages' match case.

what are your favourite examples?

(and yes "string-match" might be better)

vindarel avatar Jan 03 '24 11:01 vindarel

so by using &body instead of &rest we get this indentation:

Nice catch!

Would you not use the Trivia library for pattern matching? It probably does this, and more.

Gonna check it now.

ccqpein avatar Jan 04 '24 03:01 ccqpein

I checked the trivia it looks good when I am trying to pattern matching the list like

(trivia:match '(1 2 3)
  ((list* 1 x _)
   x)
  ((list* _ x)
   x)) ;; => 2

but I have an issue when I run the string pattern. I am not sure because I am using sbcl or not (maybe because this?)

beside, I can match the whole string like

(trivia:match "a1c5b" ("a1c5b" 1))
;; or
(trivia:match "ab" ((vector #\a #\b) 1))

but not these:

(trivia:match "a1c5b" ((string "a1c" "5b") 1))

so look like I can only binding char rather than the sub-string like my purposal

ccqpein avatar Jan 05 '24 00:01 ccqpein

Let's use and try this macro. I'm interested in everybody's feedback.

A stupid test: I match like your example, but I don't use the matching variable, so I get style warnings:

(match "a1c5b"
       (("a" i "c")
        (print "got axc"))
       (("a" x "c" y "b")
        (print "got axcyb"))
       (t (print "default"))
       )
;; =>
;; ;   The variable I is assigned but never read.
;; (and for x and y)

Would it be possible to avoid the warnings? Using a _ placeholder?

vindarel avatar Jan 23 '24 14:01 vindarel

Yes, I just try on my side. Will give PR soon.

ccqpein avatar Jan 24 '24 02:01 ccqpein

Gave the PR #114

ccqpein avatar Jan 25 '24 04:01 ccqpein

I tried this more on an AOC problem (day 19), and OMG this match macro felt so powerful. Easier and faster than searching for the right regexp.

vindarel avatar Jan 30 '24 17:01 vindarel

Other quick test:

(str::match "123 hello 456"
             (("\\d+" s "\\d+")
              s)
             (t "nothing"))
;; =>" hello 45"

I didn't expect to see "45". The first number regex was correctly matched, not the second?

(str::match "123 hello 456"
             (("\\d+" s "\\d*")
              s)
             (t "nothing"))
;; " hello 456"

here I didn't expect "456".

vindarel avatar Jan 30 '24 17:01 vindarel

@vindarel Just figure out fixing this issue need to write the un-greedy regex. I just fix it in the latest commit. Good catch!

ccqpein avatar Jan 31 '24 02:01 ccqpein

The PR #114 is merged, I am not sure if we keep this idea issue open or not for future potential changes. I left this decision to repo owner.

ccqpein avatar Feb 09 '24 01:02 ccqpein