scrapy-zyte-api
scrapy-zyte-api copied to clipboard
Zyte API integration for Scrapy
To do: - [ ] Confirm the user-facing API is as agreed with @VMRuiz and @proway2. - [x] Make existing tests pass. - [x] Restore compatibility with Scrapy 2.0.1+. -...
The downloader middleware of scrapy-zyte-api was created to prevent AutoThrottle to affect requests driven through Zyte API, and instead let Zyte API itself control throttling on the server side, sending...
## Background Retries issued by `zyte_api.aio.retry.RetryFactory` are somewhat hidden. They are logged as DEBUG messages (so they are not seen by default in new projects with LOG_LEVEL: INFO) and, I...
Resolves #118, resolves #119, resolves #120. To do: - [x] Test both snippets manually, make sure they work as expected.
I have seen 2 people now having trouble with HTTP cache in combination with scrapy-zyte-api. They set `HTTPCACHE_ENABLED` to `True`, and they get `NotSupported("Response content isn't text")`. I could not...
To do: - [x] Update after https://github.com/scrapy-plugins/scrapy-zyte-api/pull/150 - [x] Solve conflicts. - [x] Complete coverage. Fixes #243.
In the example below ZyteApiProvide makes 2 API requests instead of 1: ```py @handle_urls("example.com") @attrs.define class MyPage(ItemPage[MyItem]): html: BrowserHtml # ... class MySpider(scrapy.Spider): # ... def parse(self, response: DummyResponse, product:...
Even if `httpResponseHeaders` is not `True`, if the actual response data is plain text, we should interpret it as such.
When looking at the list of spider jobs in Scrapy Cloud, there's a column dictating the spider's `close_reason` message. Some users have raised that it's not apparently clear from this...