parsel icon indicating copy to clipboard operation
parsel copied to clipboard

Add support for regex flags in `.re()` and `.re_first()` methods

Open noviluni opened this issue 3 years ago • 2 comments

There are some cases where I need to apply a regex to multiple lines and the only workaround I found was compiling the expression and using a regex flag there.

Look at this example where I want to extract the content of the JavaScript function example() (I know the function is not exactly a function, it is just an example):

>>> import re
>>> from parsel import Selector
>>> text = """
...: <script>
...:     function example() {
...:         "name": "Adrian",
...:         "points": 3,
...:     }
...: </script>
...: """
>>> sel = Selector(text=text)

# using regex strings doesn't work
>>> sel.css('script').re_first(r"example\(\) ({.*})")

# I need to compile the function:
>>> regex = re.compile(r"example\(\) ({.*})", flags=re.DOTALL)
>>> sel.css('script').re_first(regex)
'{\n        "name": "Adrian",\n         "points": 3,\n     }'

Doing this requires some extra steps that could be avoided by adding support for regex flags to the re_first() and re() methods. And that's what I did. With this new implementation, you can directly use it like this:

>>> sel.css('script').re_first(r"example\(\) ({.*})", flags=re.DOTALL)
'{\n        "name": "Adrian",\n         "points": 3,\n     }'

This could also help a lot when needing to use case-insensitive regexes:

>>> text = 'Price: 1000.00€'
>>> sel = Selector(text=text)

# The next works
>>> sel.re_first(r'Price: ([\d.]+)€')
'1000.00'

# however, when lowering the text it stops working:
>>> text2 = 'price: 1000.00€'
>>> sel2 = Selector(text=text2)
>>> sel2.re_first(r'Price: ([\d.]+)€')

# with the new implementation you can directly do:
>>> sel2.re_first(r'Price: ([\d.]+)€', flags=re.I)
'1000.00'

Let me know your thoughts :)

noviluni avatar Aug 07 '21 20:08 noviluni

Codecov Report

Merging #225 (df0f41b) into master (d20db09) will not change coverage. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##            master      #225   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            5         5           
  Lines          291       292    +1     
  Branches        51        51           
=========================================
+ Hits           291       292    +1     
Impacted Files Coverage Δ
parsel/selector.py 100.00% <100.00%> (ø)
parsel/utils.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d20db09...df0f41b. Read the comment docs.

codecov[bot] avatar Aug 07 '21 20:08 codecov[bot]

@Gallaecio @wRAR, could you take a look? :)

noviluni avatar Aug 07 '21 20:08 noviluni