ample-regexps.el
ample-regexps.el copied to clipboard
Compose and reuse Emacs regular expressions with ease
================== ample-regexps.el
.. image:: https://travis-ci.org/immerrr/ample-regexps.el.svg?branch=master :target: https://travis-ci.org/immerrr/ample-regexps.el
Ample regular expressions — Compose and reuse Emacs regular expressions with ease.
If you ever tried to write more than a few related regexps and it felt that there should be a way to pick out their common parts and just plug them in without worrying about grouping and precedence, this package is for you.
Installation
ample-regexps is tested to work on Emacs24. It should work on Emacs23,
but no guarantees about that.
MELPA
ample-regexps is available on MELPA <http://melpa.milkbox.net>_ from where it
can be installed via::
M-x package-install ample-regexps
If you haven't yet added MELPA repositories to your config, feel free to follow
these instructions <http://melpa.milkbox.net/#/getting-started>_ to do so.
el-get
The library is also available via el-get <https://github.com/dimitri/el-get>_::
M-x el-get-install ample-regexps
Manual installation
Also, since this package has no dependencies, you can just drop the
ample-regexps.el file somewhere on load-path and enable it with
.. code-block:: emacs-lisp
(require 'ample-regexps)
Contributing
There's plenty of ways to help: use this package, spread the word, fix bugs, post bug reports or fresh ideas to the issue tracker, add tests, etc.
To participate in development, you'll probably need cask <https://github.com/cask/cask>_. The only dependency as of now is
ert-runner, so it's possible to run tests manually, but it's rather
inconvenient. It's a lot easier to just do:
.. code-block:: bash
$ cask install $ make test
Documentation
Basic Usage
The main item of the API is the define-arx macro. Let's start with a simple
example:
.. code-block:: emacs-lisp
(define-arx hello-world-rx '()) ;; -> hello-world-rx
(hello-world-rx "Hello, world") ;; -> "Hello, world"
(hello-world-rx (* "Hello, world")) ;; -> "\\(?:Hello, world\\)*"
define-arx defines a macro that converts s-exps into regular expressions.
If you're familiar with rx <http://git.savannah.gnu.org/cgit/emacs.git/tree/lisp/emacs-lisp/rx.el>_
package — if not, I encourage you to do so — you're probably starting to
experience déjà vu. You're right: rx is used underneath,
ample-regexps is just a cherry on the pie adding customization with a hint
of syntactic sugar atop.
Aliasing
Let's start with something simple and see how you can alias components to save some keystrokes:
.. code-block:: emacs-lisp
(define-arx h-w-rx
'((h "Hello, ")
(w "world"))) ;; -> hello-world-rx
(h-w-rx h w) ;; -> "Hello, world"
(h-w-rx (* h w)) ;; -> "\\(?:Hello, world\\)*"
Aliased literals are regexp quoted, but you can alias a regular expression if you want:
.. code-block:: emacs-lisp
(define-arx alnum-rx
'((alpha_ (regexp "[[:alpha:]_]"))
(alnum_ (regexp "[[:alnum:]_]")))) ;; -> alnum-rx
(alnum-rx (+ alpha_) (* alnum_)) ;; -> "[[:alpha:]_]+[[:alnum:]_]*"
In fact, (regexp ...) is just an rx S-expression which you can compose
and nest arbitrarily to define even more forms:
.. code-block:: emacs-lisp
(define-arx assignment-rx
'((alpha_ (regexp "[[:alpha:]_]"))
(alnum_ (regexp "[[:alnum:]_]"))
(ws (* blank))
(id (seq symbol-start (+ alpha_) (* alnum_) symbol-end)))) ;; -> assignment-rx
(assignment-rx id ws "=" ws id) ;; -> "\\_<[[:alpha:]_]+[[:alnum:]_]*\\_>[[:blank:]]*=[[:blank:]]*\\_<[[:alpha:]_]+[[:alnum:]_]*\\_>"
Custom S-expressions
Ok, this was all simple aliasing, but what if you want to add some custom S-expressions, too? Fear thou not, we've got you covered:
.. code-block:: emacs-lisp
(define-arx cond-assignment-rx
'((alpha_ (regexp "[[:alpha:]_]"))
(alnum_ (regexp "[[:alnum:]_]"))
(ws (* blank))
(sym (:func (lambda (_form &rest args)
`(seq symbol-start (or ,@args) symbol-end))))
(cond-keyword (sym "if" "elif" "while"))
(id (sym (+ alpha_) (* alnum_))))) ;; -> cond-assignment-rx
(cond-assignment-rx cond-keyword ws id ":" id ws "=" ws id) ;; -> "\\_<\\(?:elif\\|if\\|while\\)\\_>[[:blank:]]*\\_<\\(?:[[:alpha:]_]+\\|[[:alnum:]_]*\\)\\_>:\\_<\\(?:[[:alpha:]_]+\\|[[:alnum:]_]*\\)\\_>[[:blank:]]*=[[:blank:]]*\\_<\\(?:[[:alpha:]_]+\\|[[:alnum:]_]*\\)\\_>"
(:func ...) plist allows to use a simple function that will be passed all the
s-expressions from the form as arguments with the first argument will being the
form symbol itself. You can treat them as a list like above or decompose and
name to your liking (destructuring-bind anyone?). Let's see how one could
write a matcher for a list of comma-separated values:
.. code-block:: emacs-lisp
(define-arx csv-rx
'((csv (:func (lambda (_form n arg)
`(seq ,@(nbutlast (cl-loop for i from 1 to n
collect `(group-n ,i ,arg)
collect ", ")))))))) ;; -> csv-rx
(csv-rx (csv 3 (seq "foobar"))) ;; -> "\\(?1:foobar\\), \\(?2:foobar\\), \\(?3:foobar\\)"
There's a drawback to this, if you pass an incorrect number of arguments, you'll get an unreadable error message:
.. code-block:: emacs-lisp
(csv-rx (csv 3 "foo" "bar")) ;; -> Wrong number of arguments: (lambda (_form n arg) (\` (seq (\,@ (nbutlast (cl-loop for i from 1 to n collect (\` (group-n (\, i) (\, arg))) collect ", ")))))), 4
To make this more readable, form-function plist supports :min-args and :max-args keywords:
.. code-block:: emacs-lisp
(define-arx csv-rx
'((csv (:func (lambda (_form n arg)
`(seq ,@(nbutlast (cl-loop for i from 1 to n
collect `(group-n ,i ,arg)
collect ", "))))
:min-args 2
:max-args 2)))) ;; -> csv-rx
(csv-rx (csv 3 "foo" "bar")) ;; -> (error "rx form `csv' accepts at most 2 args")
(csv-rx (csv 3)) ;; -> (error "rx form `csv' requires at least 2 args")
Recursion
Form functions obviously can be made to support recursion. You may have
noticed that csv-rx only matches lists of exactly N elements. Let's fix it
to match any length up to N (you can achieve the same effect with a simple
loop, but I really wanted to avoid using factorial to show recursion):
.. code-block:: emacs-lisp
(defun csv-opt (_form n elt &optional accum)
(cond
((<= n 0) accum)
((null accum) (list _form (1- n) elt (list 'group-n n elt)))
(t (list _form (1- n) elt (list 'group-n n elt `(opt ", " ,accum)))))) ;; -> csv-opt
(define-arx csv-opt-rx
'((csv-opt (:func csv-opt)))) ;; -> csv-opt-rx
(csv-opt-rx (csv-opt 3 "foo")) ;; -> "\\(?1:foo\\(?:, \\(?2:foo\\(?:, \\(?3:foo\\)\\)?\\)\\)?\\)"
Such expressions in plain-text are hardly readable, let alone maintainable, but wrapped in a function call they don't seem scary at all.
Raw Power
Form functions can return raw regular expressions, too. This is, for example,
how you could backport group-n form to Emacs23 where it's not available (if
you had to):
.. code-block:: emacs-lisp
(define-arx backport-rx
'((group-n (:func (lambda (_form index &rest args)
(concat (format "\\(?%d:" index)
(mapconcat (lambda (f) (rx-form f ':)) args "")
"\\)")))))) ;; -> backport-rx
(backport-rx (group-n 1 (seq "foo" (* "bar")))) ;; -> "\\(?1:foo\\(?:bar\\)*\\)"
The snippet above uses mapconcat and a bit of underdocumented rx
functionality, you can avoid that with special convenience functions:
arx-and and arx-or:
.. code-block:: emacs-lisp
(define-arx backport-rx
'((group-n (:func (lambda (_form index &rest args)
(concat (format "\\(?%d:" index)
(arx-and args)
"\\)")))))) ;; -> backport-rx
(backport-rx (group-n 1 (seq "foo" (* "bar")))) ;; -> "\\(?1:foo\\(?:bar\\)*\\)"
Be warned though, this is a power user feature and no extra grouping will be performed which may cause unexpected results:
.. code-block:: emacs-lisp
(define-arx ungrouped-rx
'((foo (:func (lambda (_form) "foo"))))) ;; -> ungrouped-rx
(ungrouped-rx (foo) (foo)) ;; -> "foofoo"
(ungrouped-rx (* (foo))) ;; -> "foo*"
To avoid surprises, make sure you the resulting expressions are grouped.
How Does This Work
(define-arx foobar-rx ...) is a macro, that defines three things:
- a macro
(foobar-rx ...)to be replaced by a constant during compilation - a function
(foobar-rx-to-string ...)that can be used in runtime - a variable
foobar-rx-constituentswith form definitions to use
When either the function or the macro is called, constituents variable is used
to override rx-constituents via dynamic scoping and the rest is performed by
rx-to-string function.
License
This package is provided under the terms and conditions of GPLv3 license.
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/ .