parsel
parsel copied to clipboard
Add support for regex flags in `.re()` and `.re_first()` methods
There are some cases where I need to apply a regex to multiple lines and the only workaround I found was compiling the expression and using a regex flag there.
Look at this example where I want to extract the content of the JavaScript function example()
(I know the function is not exactly a function, it is just an example):
>>> import re
>>> from parsel import Selector
>>> text = """
...: <script>
...: function example() {
...: "name": "Adrian",
...: "points": 3,
...: }
...: </script>
...: """
>>> sel = Selector(text=text)
# using regex strings doesn't work
>>> sel.css('script').re_first(r"example\(\) ({.*})")
# I need to compile the function:
>>> regex = re.compile(r"example\(\) ({.*})", flags=re.DOTALL)
>>> sel.css('script').re_first(regex)
'{\n "name": "Adrian",\n "points": 3,\n }'
Doing this requires some extra steps that could be avoided by adding support for regex flags to the re_first()
and re()
methods. And that's what I did. With this new implementation, you can directly use it like this:
>>> sel.css('script').re_first(r"example\(\) ({.*})", flags=re.DOTALL)
'{\n "name": "Adrian",\n "points": 3,\n }'
This could also help a lot when needing to use case-insensitive regexes:
>>> text = 'Price: 1000.00€'
>>> sel = Selector(text=text)
# The next works
>>> sel.re_first(r'Price: ([\d.]+)€')
'1000.00'
# however, when lowering the text it stops working:
>>> text2 = 'price: 1000.00€'
>>> sel2 = Selector(text=text2)
>>> sel2.re_first(r'Price: ([\d.]+)€')
# with the new implementation you can directly do:
>>> sel2.re_first(r'Price: ([\d.]+)€', flags=re.I)
'1000.00'
Let me know your thoughts :)
Codecov Report
Merging #225 (df0f41b) into master (d20db09) will not change coverage. The diff coverage is
100.00%
.
@@ Coverage Diff @@
## master #225 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 5 5
Lines 291 292 +1
Branches 51 51
=========================================
+ Hits 291 292 +1
Impacted Files | Coverage Δ | |
---|---|---|
parsel/selector.py | 100.00% <100.00%> (ø) |
|
parsel/utils.py | 100.00% <100.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update d20db09...df0f41b. Read the comment docs.
@Gallaecio @wRAR, could you take a look? :)