requests-html
requests-html copied to clipboard
Is it possible somehow to include support for html5lib parser?
Sometimes there are malformed HTML structures, that nor html.parser nor lxml can deal with it.
In this cases html5lib might be helpful.
Here's an example of this problem from a stack overflow question:
https://stackoverflow.com/q/52699466/
It would be nice to be able to choose the parser by passing in a string like this: HTMLSession(parser='html5')
This could be implemented like PyQuery.fromstring() https://github.com/gawel/pyquery/blob/1dd000c941ff3228606b2796feca55bbc9671b7a/pyquery/pyquery.py#L86
Yes,I left the problem, and this feature is neccesary. Exposing the parser should be a good solution. :)
Is anyone willing to implement this?
This package doesn't currently work on an M1 mac, partially because there's no alternative to lxml