doc icon indicating copy to clipboard operation
doc copied to clipboard

Make regex <{}> family more prominent

Open librasteve opened this issue 5 years ago • 12 comments

Problem or new feature

Some folks have missed the capability of the various regex interpolation bits https://docs.raku.org/language/regexes#Regex_interpolation

  • eg this SO https://stackoverflow.com/questions/64909029/is-it-possible-to-do-boolean-assertions-with-raku-regex
  • me especially '<@' missed for a while

Suggestions

If applicable, suggest something towards the solution.

Maybe a page on regex "power tools"?

librasteve avatar Nov 22 '20 19:11 librasteve

Maybe you wanted to link something with "this SO"? Thanks a lot for the issue.

JJ avatar Nov 23 '20 06:11 JJ

red face

librasteve avatar Nov 23 '20 08:11 librasteve

There's a table that shows only four types of interpolation: https://docs.raku.org/language/regexes#Regex_interpolation

It just has <{}>, and doesn't show anything like <?{}>.

Maybe it should be expanded? A second table added?

doomvox avatar Nov 23 '20 15:11 doomvox

Don't have any particular preference here. Your call.

JJ avatar Nov 23 '20 16:11 JJ

Hi - perhaps I can inject a couple of thoughts...(?)

It feels to me that Regex Interpolation is a bit difference|improvement over perl5.

The topic gets introduced near the top of this explanatory page with interpolation of named regexen, something like this: "Japanese pottery rocks!" ~~ / <$regex> /; # Interpolation of $regex into /.../

Then there are the items in the table you link - about 20 sections further down. So very few people get that far before zoning out (unless they know what they are looking for).

How about lifting the table up just below the named regex section?

Anyway - just a thought...

Finally, then there are items that I think should be supported - but are not there yet: <@list-of-matches> #matches any item in the list, right? <%????> #is there an intent for hashes (maybe match any key in the hash)

Don't know if these are worth a mention...?

librasteve avatar Nov 23 '20 17:11 librasteve

How about lifting the table up just below the named regex section

Moving the table up sounds good.

Finally, then there are items that I think should be supported - but are not there yet: <@list-of-matches> #matches any item in the list, right?

Ah, thanks for explaining that it's unsupported. You mentioned it before and I wondered why I couldn't figure out how to use it.

<%????> #is there an intent for hashes (maybe match any key in the hash)

Were it up to me, I might say it should do hash lookups, and match if the value evaluates to True but not if it evaluates to False.

Don't know if these are worth a mention...?

If there's a clear plan to include them, I'd mention them.

doomvox avatar Nov 25 '20 19:11 doomvox

<%????> #is there an intent for hashes (maybe match any key in the hash) Were it up to me, I might say it should do hash lookups, and match if the value evaluates to True but not if it evaluates to False.

How about match on the key (since usually that's a Str) and have the Capture (eg. $0) give back the value... ;-)

librasteve avatar Nov 26 '20 08:11 librasteve

I've posted a new SO here to see if anyone knows https://stackoverflow.com/questions/65018490/raku-regex-list-interpolation

librasteve avatar Nov 26 '20 08:11 librasteve

I would say the first place to look for historical design thoughts about Raku is design.raku.org.

(NB. The "Design" documents at design.raku.org are called "specs" for good reason -- to emphasize their nature as ultimately speculative specification. Imo nothing related to any of these speculations belongs in the doc unless the "specified" features are already in the actual specification, aka roast.)


For regexes the relevant design doc is S05.


An in page search for "<@" and "<%" yielded zero matches. So then I tried "%" and "@". I didn't see any relevant design discussion when going thru the % matches but saw lots when trawling thru the first few @ matches. I share some of it in this comment.

NB. This comment may be wildly short of what's in the doc. I only went thru some of the @ search, and even that was enough that I saw relevant text which I just happened to notice -- it wasn't directly caught by the search. So, to be crystal clear, I did not look thru the whole document, nor even the whole of clearly relevant sections such as the section discussing "Extensible metasyntax <...>". If folk truly want to know what @Larry thought, which mostly means what Larry thought, they really should laboriously trawl thru that document. I've done some work to illustrate the catch that such a trawl may reveal but I stopped once I felt I had a sufficient catch to be useful and demonstrate there's almost certainly much more interesting stuff to be found.

https://design.raku.org/S05.html#line_1635: A leading @ matches like a bare array except that each element is treated as a subrule (string or Regex object) rather than as a literal. That is, a string is forced to be compiled as a subrule instead of being matched literally. (There is no difference for a Regex object.) This assertion is not automatically captured.

https://design.raku.org/S05.html#line_1644: The use of a hash as an assertion is reserved.

https://design.raku.org/S05.html#line_1876: A leading * indicates that the following pattern allows a partial match. It always succeeds after matching as many characters as possible. (It is not zero-width unless 0 characters match.) For instance, to match a number of abbreviations, you might write any of:

    s/ ^ G<*n|enesis>     $ /gen/  or
    s/ ^ Ex<*odus>        $ /ex/   or
    s/ ^ L<*v|eviticus>   $ /lev/  or
    ...

https://design.raku.org/S05.html#line_3970: Array aliasing ... @<from>=...

raiph avatar Nov 26 '20 15:11 raiph

OK I read the section @raiph refs. Well, if the words Extensible Metasyntax did not cause newbies (and middlebies) to run away, I would suggest taking this as a logical sequence for a (completely) rewritten section in the regex docs (if it made the actual language | implementation). Or - probably better, keep the name of "Regex Interpolation" and, in addition to promoting its visibility, add a subsection with the other members of the zoo mentioned in 05.

Hmmm - just googled 'site:docs.raku.org <foo=bar>' -> got nothing! Sooo - on 2nd thoughts I think there should be a new doc page, distinct from, and interlinked to, https://docs.raku.org/language/regexes. This idea means that the current regex section which is quite long and Is proven and 'correct' and reflects the perl5 RE stuff can stay unmolested. The new page is titled "regex interpolation" and follows the line of 05 Extensible Metasyntax, reflecting and maybe extending/including the current table (?).

Obviously all this needs to be checked against what is implemented.

Also can mop up the footnote from https://docs.raku.org/syntax/regex, namely A list of predefined subrules is listed in S05-regex of design documents.

librasteve avatar Nov 26 '20 19:11 librasteve

NB. While I know the design docs quite well, they're frozen, and there's no process or judgment calls to consider. In contrast the end user docs are evolving, I don't know them or the process well, and the following reflects my lack of knowledge of these things.

a logical sequence for ... section in the regex docs

Is there no such "logical sequence" of assertions (<...>) in existing doc, even if some/many entries are missing?

I do see merit to organizing a document the way Larry did, namely to have a section focused entirely on <...> assertions within regexes, and then enumerating the options one by one. Or perhaps better still, giving each its own page, but having another page that transcludes them all.

If such a section existed I'd suggest a preamble entry also be written discussing <...> in the context of regexes more generally, because it's also used in relation to regexes in forms that aren't assertions within regexes, but in a manner that's overall strangely consistent with them and with other forms of use in Raku.

(completely) rewritten

To quote from an email I wrote today in another context but also about the docs:

If you are willing to put a great deal of effort into helping evolve the doc by both patiently discussing your concerns AND also writing and/or rewriting doc in accord with a mandate to do so based on a rough consensus, where rough doesn't mean unfriendly to any of those involved, then I am ... with you, and may even be talked into helping a bit.

Otherwise, I think that by far the most important thing is that we support folk who both really care about improving the doc and who will keep working on it in a manner that helps sustain progress for both our doc and that of the individuals doing the work.

(if it made the actual language | implementation).

I think the criteria would be better generally limited to a subset of:

  • Made roast. A lot of stuff that's in Rakudo does not belong in the doc.

  • Considered desirable. Imo some of the stuff in roast is not.^1

Even something to consider passes those two hurdles, I still think any move to include it in the doc ought generally be discussed at suitable length (defaulting to a longer period) until there's a clear consensus on including it, before even working toward a draft PR.

Of course, there'll be exceptions, but imo something like the above ought be the norm.

keep the name of "Regex Interpolation"

I think "interpolation" is generally understood to mean inserting something in something else, i.e. "regex interpolation" would suggest inserting a non-regex thing into a regex. I'd say most of the <...> assertions are regex things.

just googled 'site:docs.raku.org <foo=bar>' -> got nothing!

I got 178 matches, without the quotes. Perhaps you mean with '<foo=bar>'?

But if your point is the general one of google searchability, well, Raku is a case study in being google unfriendly (or google being Raku unfriendly).

Obviously all this needs to be checked against what is implemented.

Imo, against roast.

Also can mop up the footnote from https://docs.raku.org/syntax/regex, namely A list of predefined subrules is listed in S05-regex of design documents.

Again, imo any such action ought be against roast.


^1 Consider, for example, the arguably crazy transliteration functionality.

raiph avatar Nov 27 '20 21:11 raiph

Thank you @raiph for clarifying my opinions - NB to all that I am writing as a consumer of the docs and anything I mentioned should be construed as a loose suggestion at best, certainly NOT as advice / guidance / instructions. Kudos and a free hand to those who are doing the work, guided by the processes.

On a couple of specifics:

  • I'm glad to hear that we agree on the
  • I am not necessarily for "Regex Interpolation" as the generic name for the <...> things in regexes (but, for the reasons mentioned I am rather against "Extensible Metasyntax" as a title)
  • Searching on <foo=bar> is not a wise thing anyway (apologies for the red herring) - my intent was to try and locate where the docs do mention the <foo=bar> concept.... for the record, its here https://docs.raku.org/syntax/regex

To give the capture a different name from the regex, use the syntax <capture-name=named-regex>. If no capture is desired, a leading dot or ampersand will suppress it: <.named-regex> if it is a method declared in the same class or grammar, <&named-regex> for a regex declared in the same lexical context.

And 100% agree that my phrase "what is implemented" should be "what is in roast".

librasteve avatar Nov 28 '20 10:11 librasteve