doc icon indicating copy to clipboard operation
doc copied to clipboard

What is an 'atom'?

Open zoffixznet opened this issue 7 years ago • 2 comments

The docs for <ws> say the token is autoplaced after an "atom", but what exactly is an atom in a grammar doesn't seem to be explained anywhere.

zoffixznet avatar Jul 23 '18 12:07 zoffixznet

Any part of a regex that you can identify as having any meaning is an atom.

So for example in \d ** 4, the \d and 4 are terms, then ** is an operator, and all three of those are atoms.

The statement that <.ws> is automatically inserted after an atom isn't quite always true. In the example above, that would lead to \d <.ws> ** <.ws> 4, which makes no sense. A more precise wording would be that <.ws> is placed after terms and closing parenthesis/brackets, and that operators typically special-case <.ws> handling.

moritz avatar Jul 24 '18 18:07 moritz

I'm reopening this, because the concept of atom in a regex probably deserves its own fleshed out subsection somewhere. The answer to this question on SO fully relied on the clarification of this concept. Additionally, non-capturing grouping also creates an atom, so it should be clarified that happens.

JJ avatar Jul 05 '20 07:07 JJ