crawlee-python icon indicating copy to clipboard operation
crawlee-python copied to clipboard

add support for Parsel

Open Ehsan-U opened this issue 1 year ago • 6 comments

BeautifulSoup lacks proper type hints, mostly Any type, hence not effective IDE autocompletion. A solid alternative is Parsel. It supports CSS selectors, XPath expressions for HTML and XML, JMESPath for JSON documents, and Regex expressions. Additionally, Parsel is the parser used by Scrapy.

Ehsan-U avatar Jul 21 '24 13:07 Ehsan-U

I was thinking about selectolax

siddiqkaithodu avatar Jul 21 '24 16:07 siddiqkaithodu

selectolax doesn't support XPATH selector nor JMESPath for JSON.

Ehsan-U avatar Jul 21 '24 16:07 Ehsan-U

We started out with BeautifulSoup because of its popularity, but you're right that it has its shortcomings. Adding support for either selectolax or parsel as a new crawler type should be fairly easy - we'll consider it.

janbuchar avatar Jul 22 '24 15:07 janbuchar

+1 for Parsel

asymness avatar Jul 22 '24 18:07 asymness

@janbuchar, I'd like to help out by adding Parsel support as a new crawler type. Would you be open to a PR from me for this?

asymness avatar Jul 22 '24 18:07 asymness

@janbuchar, I'd like to help out by adding Parsel support as a new crawler type. Would you be open to a PR from me for this?

Absolutely :slightly_smiling_face:

janbuchar avatar Jul 22 '24 20:07 janbuchar