parsel icon indicating copy to clipboard operation
parsel copied to clipboard

Adding a `strip` kwarg to `get()` and `getall()`

Open bblanchon opened this issue 3 years ago • 1 comments

Hi,

Thank you very much for this excellent library ❤️

I've been using Parsel for a while and I constantly find myself calling .strip() after .get() or .getall(). I think it would be very helpful if Parsel provided a built-in mechanism for that.

I suggest adding a strip kwarg to get() and getall(). It would be a boolean value, and when it's true, Parsel would call strip() on every match.

Example with get():

# Before
author = selector.css("[itemprop=author] [itemprop=name]::text").get()
if author:
   author = author.strip()

# After
author = selector.css("[itemprop=author] [itemprop=name]::text").get(strip=True)

Example with getall():

# Before
authors = [author.strip() for author in selector.css("[itemprop=author] [itemprop=name]::text").getall()]

# After
authors = selector.css("[itemprop=author] [itemprop=name]::text").getall(strip=True)

Alternatively, we could change the ::text pseudo-element to support an argument, like ::text(strip=1). That would be extremely handy too and probably more flexible than my original suggestion, but also more difficult to implement.

I know I could strip whitespaces with re() and re_first() but it's overkill and hides the intent.

Best regards, Benoit

bblanchon avatar Aug 22 '22 13:08 bblanchon

PR #260 and #127 have gone stale. Would one of them ever get merged? I can't imagine I'm the only person calling .strip() on scraped strings.

bblanchon avatar Aug 11 '23 16:08 bblanchon