pcre2el icon indicating copy to clipboard operation
pcre2el copied to clipboard

Possible issue with PCRE "^\\section"

Open priyadarshan opened this issue 8 years ago • 6 comments

Hi,

Trying to convert this PCRE:

https://regex101.com/r/qL1cX4/1

(rxt-pcre-to-elisp "^\\section") gives

"^[ 
^l^M ]ection"

Same things with (rxt-pcre-to-rx "^\\section")

(seq bol (any 9 10 12 13 32) "ection")

Is that the correct behaviour?

priyadarshan avatar Apr 07 '16 16:04 priyadarshan

Reading carefully all the examples, I realised that one needs to double up the backslashes. I knew that was the case with elisp, but I thought pcre2el was meant to preserve exact perl syntax?

The example regex would be working as

 (re-search-forward (rxt-pcre-to-elisp "^\\\\section") nil t)

That is fine of course, but it means one could not use Perl RE as-is, sharing perhaps a common repository of filter, but changed them before using them.

Forgive my ignorance in the subject, would not be possible to fix that using a macro?

priyadarshan avatar Apr 14 '16 17:04 priyadarshan

I was also wondering about this. Interestingly, re-builder with pcre syntax works with regexes like "^\\section", or (\d+).

It would be quite useful to have same functionality, ie, being able to enter real PCRE, like rebuilder+pcre above, but programmatically, for example (rxt-pcre-to-elisp "(\d+)").

Right now that does not work, I need to specify (rxt-pcre-to-elisp "(\\d+)") which is not a real PCRE anymore.

vsrepo avatar Apr 25 '16 12:04 vsrepo

@priyadarshan , This should be the correct behavior we want. "\s" in emacs strings is explained to "\s", which match any blank char (\r, \t, \s) in perl, what you see `^[ ^l^M ]' just the literal form of those character.

if you want match \s literal form, you have to told it this is literal form, escape the first \ with \\, the result is: \\s.

So, you need feed rxt-pcre-to-elisp with "\\\\s", what rxt-pcre-to-elisp see is just \\s.

I think this is hard to resolve like ruby/perl, unless elisp support a new literal presentation like /regexp/ instead of "regexp".

zw963 avatar May 08 '16 05:05 zw963

I need to specify (rxt-pcre-to-elisp "(\d+)") which is not a real PCRE anymore.

The same course(see previous reply), if we need a real PCRE, we need a new presentation which can skip emacs lisp literal strings representation. I think.

zw963 avatar May 08 '16 05:05 zw963

Oh, @joddie (the author) seem like still on travelling now. I have another issue not resolved until now. Still need wait ...

zw963 avatar May 08 '16 05:05 zw963

The same course(see previous reply), if we need a real PCRE, we need a new presentation which can skip emacs lisp literal strings representation. I think.

Yes, that's what I thought. It would be quite useful, at least to some, to have a way to add real PCRE.

My use case is having to track hundreds of evolving Perl and Php regexes upstream, and the need to use them with elisp.

To be able to feed them as they are to pcre2el would be a veritable boon.

Then, one could maintain a regex corpus just in its orginal form.

priyadarshan avatar May 08 '16 06:05 priyadarshan