Regex101
Regex101 copied to clipboard
Python 2 and Python 3 RE doesn't have the same result
Bug Description
Python RE should discriminate between Python 2 and Python 3. I have opened a bug report in cpython in case the problem come from there: https://github.com/python/cpython/issues/109579
Reproduction steps
With the python RE:
(?:^|(?<=>))([\s\S]*?)(?:(?=<)|$)
and the text:
<p>sentence 1</p><p>sentence 2</p>
import re
html_text = "<p>sentence 1</p><p>sentence 2</p>"
pattern = re.compile(r"(?:^|(?<=>))([\s\S]*?)(?:(?=<)|$)")
sentence_list = list(pattern.findall(html_text))
Expected Outcome
In Python2 the result should be:
sentence 1
sentence 1
sentence 2
sentence 2
and in Python 3 the result should be:
<p>sentence 1
<p>sentence 1
<p>sentence 2
<p>sentence 2
Browser
Firefox
OS
Windows & Linux
Hello @Cabu,
The site is currently reflective of python 2. Python 3 support is being worked on, please refer to the linked issues in this comment: https://github.com/firasdib/Regex101/issues/1464#issuecomment-758062925
As far as the site being consistent with what python 2 returns - it is: https://regex101.com/r/RM1Hdc/1 vs python 2
Python 3 does include the <p>
using the same regex, which makes sense since python 3's re module is more aligned with PCRE.
If you're looking for consistency across python versions, you could for example have [^<>]
instead of the [\s\S]
. There are generally many ways of getting the same match going, you might want to drop in our IRC or Discord live support to bounce ideas off other people.
Hope this helped :)