web-poet icon indicating copy to clipboard operation
web-poet copied to clipboard

@field(extra=True)

Open kmike opened this issue 1 year ago • 1 comments

This feature allows to declare a field as @field(extra=True). Such fields are ignored if the item type doesn't have them. It allows to define fields in the base class, which are not in the base item, and let items in subclasses use such fields.

class FooPage(ItemPage):
    item_cls = StandardItem

    @field
    def standard_field1(self):
        return ...
   
     # .. all other fields   

    @field(extra=True)
    def non_standard_field(self):
        # this field is not in the StandardItem
        return ...

    def to_item(self):
        return item_from_fields_sync(self, self.item_cls)


@attrs.define
class ExtraItem(StandardItem):
    non_standard_field: Optional[str]


class CustomFooPage(FooPage):
    item_cls = ExtraItem

This PR is opened to get feedback. It partially addresses https://github.com/scrapinghub/web-poet/issues/59. But I don't like this approach (-1 to merge this PR); there are at least two issues with it:

  1. FooPage now "knows" about non_standard_field attribute name, which is only defined in a non-standard item.
  2. It doesnt solve a problem of removing or renaming of the fields in subclasses, unless you define all fields as extra=True in the base class. Defining them all as extra=True defeats the purpose of having item_cls_fields=False as a default, as it disables all validation.

To solve (1), one can avoid using fields, and use a regular method/property, though with more code:

class FooPage(ItemPage):
    # ...
    def get_non_standard_field(self):
        # this field is not in the StandardItem
        return ...

class CustomFooPage(FooPage):
    item_cls = ExtraItem

    @field
    def non_standard_field(self):
        return self.get_non_standard_field()

kmike avatar Aug 12 '22 09:08 kmike

Codecov Report

Merging #63 (237f7cd) into master (554bbfa) will increase coverage by 0.20%. The diff coverage is 100.00%.

:exclamation: Current head 237f7cd differs from pull request most recent head b86489c. Consider uploading reports for the commit b86489c to get more accurate results

@@             Coverage Diff             @@
##           master       #63      +/-   ##
===========================================
+ Coverage   99.79%   100.00%   +0.20%     
===========================================
  Files          17        17              
  Lines         488       501      +13     
===========================================
+ Hits          487       501      +14     
+ Misses          1         0       -1     
Impacted Files Coverage Δ
web_poet/fields.py 100.00% <100.00%> (+2.56%) :arrow_up:

codecov[bot] avatar Aug 12 '22 10:08 codecov[bot]