Regex101 icon indicating copy to clipboard operation
Regex101 copied to clipboard

Python 2 and Python 3 RE doesn't have the same result

Open Cabu opened this issue 9 months ago • 1 comments

Bug Description

Python RE should discriminate between Python 2 and Python 3. I have opened a bug report in cpython in case the problem come from there: https://github.com/python/cpython/issues/109579

Reproduction steps

With the python RE:

(?:^|(?<=>))([\s\S]*?)(?:(?=<)|$)

and the text:

<p>sentence 1</p><p>sentence 2</p>
import re
html_text = "<p>sentence 1</p><p>sentence 2</p>"
pattern = re.compile(r"(?:^|(?<=>))([\s\S]*?)(?:(?=<)|$)")
sentence_list = list(pattern.findall(html_text))

Expected Outcome

In Python2 the result should be:



sentence 1
sentence 1


sentence 2
sentence 2


and in Python 3 the result should be:



<p>sentence 1
<p>sentence 1


<p>sentence 2
<p>sentence 2


Browser

Firefox

OS

Windows & Linux

Cabu avatar Sep 19 '23 14:09 Cabu

Hello @Cabu,

The site is currently reflective of python 2. Python 3 support is being worked on, please refer to the linked issues in this comment: https://github.com/firasdib/Regex101/issues/1464#issuecomment-758062925

As far as the site being consistent with what python 2 returns - it is: https://regex101.com/r/RM1Hdc/1 vs python 2

Python 3 does include the <p> using the same regex, which makes sense since python 3's re module is more aligned with PCRE.

If you're looking for consistency across python versions, you could for example have [^<>] instead of the [\s\S]. There are generally many ways of getting the same match going, you might want to drop in our IRC or Discord live support to bounce ideas off other people.

Hope this helped :)

working-name avatar Sep 20 '23 03:09 working-name