Paul Tremberth comments

Results 81 comments of


                                            Paul Tremberth

Scrapy capitalizes headers for request

@kmike , you mean `Headers` should store keys as-is and still allow case-insensitive lookups?

Work around incorrect extraction of "reserved" HTML entities

Thanks for reporting @immerrr ! It does not look straightforward to fix though. `html5lib` [does the replacement clearly](https://github.com/html5lib/html5lib-python/blob/a3022dcea691780d300547bbf68b4dd921995d1c/html5lib/_tokenizer.py#L90), while with libxml2 `HTMLParser` it seems [this case is not handled](https://github.com/GNOME/libxml2/blob/d8083bf77955b7879c1290f0c0a24ab8cc70f7fb/HTMLparser.c#L2580). Maybe...

What do you think about Selector(response).xpath().map() ?

I'm also -0 on this one. I prefer comprehension myself.

Make sel.xpath('.') work the same for text elements

Hey @Gallaecio , I'd also want to see this. Also, I believe the issue is with `lxml` and not `libxml2` (and not parsel either): `lxml` text nodes do not accept...

Make sel.xpath('.') work the same for text elements

Related: https://bugs.launchpad.net/lxml/+bug/996134

[Feature Request] Add support for JMESPath

Offering `.json()/.jmespath()/.jsonpath()` for a Selector instantiated with a JSON string, with `type="json"`? why not. Being able to chain JSON selectors? why not as well. But I don't see a compelling...

Bad HTML parsing

See related https://github.com/scrapy/parsel/pull/54 which adds a parser_cls attribute to customize the parser. Note that scrapy/parsel favors speed (lxml) over browser-parsing compliance: html5lib is still much slower than lxml (as far...

Scrapy Realtime Execution

@netconstructor , I' have just discovered SpiderKeeper so I don't know what is provided here regarding your use case. But you can checkout https://github.com/scrapinghub/scrapyrt: > You simply run Scrapyrt in...

added SplashHtmlResponse (Fixed #114)

Thanks @atultherajput ! I think you can also set `SplashHtmlResponse` for `'application/xhtml+xml'` and `'application/vnd.wap.xhtml+xml'` so as to follow [what Scrapy does](https://github.com/scrapy/scrapy/blob/129421c7e31b89b9b0f9c5f7d8ae59e47df36091/scrapy/responsetypes.py#L18).

added SplashHtmlResponse (Fixed #114)

@kmike , what do you think of this PR now?