web-poet icon indicating copy to clipboard operation
web-poet copied to clipboard

Web scraping Page Objects core library

Results 38 web-poet issues
Sort by recently updated
recently updated
newest added

There is `RulesRegistry.page_cls_for_item` that gives you the page object class to use given an output item class. It would be nice to have a method that, in addition to supporting...

enhancement

As found in #134, if ZyteItemAdapter is added to ItemAdapter when the fixture is generated, like documented at https://zyte-common-items.readthedocs.io/en/latest/setup.html#configuration , the generated output.json will not contain fields that are empty,...

Build on top of https://github.com/scrapinghub/web-poet/pull/120. This is related to #115 which implements [approach 3](https://github.com/scrapinghub/web-poet/issues/115#issuecomment-1408526596). This is an alternative to https://github.com/scrapinghub/web-poet/pull/118 wherein: - `SelectFields` is now an _optional_ dependency to `ItemPage`....

Similar to https://github.com/scrapinghub/web-poet/pull/63 but this requires the field to be controlled with `@field(disabled=True)`. It also requires that the disabled field be available in the item class. ``` python class ArticlePage(WebPage[Article]):...

I tried to use scrapy-poet's savefixture together with Product item from zyte-common-items. The result: meta.json: ``` { "frozen_time": "2023-01-31T17:25:54.362413+00:00" } ``` output.json: ``` "metadata": { "dateDownloaded": "2023-01-31T17:25:55Z", "probability": 1.0 },...

Stemming from https://github.com/scrapinghub/scrapy-poet/pull/111 where we'd want to implement the API in **web-poet** itself regarding extracting data from a subset of fields. # API The main directives that we want to...

discuss

Changes: - Implement a `SwitchPage` class, that can be subclassed to create special page object classes that call other page object classes based on the received input. - New documentation...

POC for now. Used in https://github.com/zytedata/zyte-common-items/pull/21.