requests-html icon indicating copy to clipboard operation
requests-html copied to clipboard

Is it possible somehow to include support for html5lib parser?

Open dimmg opened this issue 7 years ago • 4 comments

Sometimes there are malformed HTML structures, that nor html.parser nor lxml can deal with it. In this cases html5lib might be helpful.

dimmg avatar Apr 30 '18 13:04 dimmg

Here's an example of this problem from a stack overflow question:
https://stackoverflow.com/q/52699466/

It would be nice to be able to choose the parser by passing in a string like this: HTMLSession(parser='html5')

This could be implemented like PyQuery.fromstring() https://github.com/gawel/pyquery/blob/1dd000c941ff3228606b2796feca55bbc9671b7a/pyquery/pyquery.py#L86

haakenlid avatar Oct 08 '18 12:10 haakenlid

Yes,I left the problem, and this feature is neccesary. Exposing the parser should be a good solution. :)

xzycn avatar Oct 08 '18 12:10 xzycn

Is anyone willing to implement this?

oldani avatar Feb 26 '19 15:02 oldani

This package doesn't currently work on an M1 mac, partially because there's no alternative to lxml

rkhwaja avatar Jan 25 '21 09:01 rkhwaja