web-poet
web-poet copied to clipboard
Web scraping Page Objects core library
There is `RulesRegistry.page_cls_for_item` that gives you the page object class to use given an output item class. It would be nice to have a method that, in addition to supporting...
As found in #134, if ZyteItemAdapter is added to ItemAdapter when the fixture is generated, like documented at https://zyte-common-items.readthedocs.io/en/latest/setup.html#configuration , the generated output.json will not contain fields that are empty,...
Build on top of https://github.com/scrapinghub/web-poet/pull/120. This is related to #115 which implements [approach 3](https://github.com/scrapinghub/web-poet/issues/115#issuecomment-1408526596). This is an alternative to https://github.com/scrapinghub/web-poet/pull/118 wherein: - `SelectFields` is now an _optional_ dependency to `ItemPage`....
Similar to https://github.com/scrapinghub/web-poet/pull/63 but this requires the field to be controlled with `@field(disabled=True)`. It also requires that the disabled field be available in the item class. ``` python class ArticlePage(WebPage[Article]):...
I tried to use scrapy-poet's savefixture together with Product item from zyte-common-items. The result: meta.json: ``` { "frozen_time": "2023-01-31T17:25:54.362413+00:00" } ``` output.json: ``` "metadata": { "dateDownloaded": "2023-01-31T17:25:55Z", "probability": 1.0 },...
Stemming from https://github.com/scrapinghub/scrapy-poet/pull/111 where we'd want to implement the API in **web-poet** itself regarding extracting data from a subset of fields. # API The main directives that we want to...
Changes: - Implement a `SwitchPage` class, that can be subclassed to create special page object classes that call other page object classes based on the received input. - New documentation...
POC for now. Used in https://github.com/zytedata/zyte-common-items/pull/21.