web-poet
web-poet copied to clipboard
Guidelines for using item_cls_fields: True or False by default?
@gatufo has some concerns about the current guidelines:
https://github.com/scrapinghub/web-poet/pull/53#issuecomment-1203902479
For me when combining both Item and Page Objects, Item defines the fields that what you want to extract, and Page Objects become just providers of those fields. That's why I think the default value should be True here to allow (or encourage) this use by default.
I think that the current guidelines, which encourage using item_cls_fields=False
by default in some scenarios, have a practical benefit: they allow to detect more errors. Unless there are some practical issues with this approach I'm -1 to change the current guidelines, but I might be missing something (as often happens :).
What are the scenarios where the current guidelines would lead to worse code, or more undetected errors?
I'm inclined with having item_cls_fields=False
as the default because it throws off an explicit error for undefined fields.
However, let's explore this in the perspective of POs as simply "filling up" items with values.
Suppose we have the following items with the corresponding fields:
-
Item A having these fields:
foo
,bar
,zen
-
Item B having these fields:
foo
,bar
,
Moreover, let's suppose that we have a PO with the following methods defined with the @field
decorator:
-
foo
-
bar
-
extra
Having item_cls_fields=False
would lead to these errors:
-
__init__()
got an unexpected keyword argument 'extra'. -
Item A having
__init__()
missing 1 required positional argument: 'zen'.
Setting item_cls_fields=True
wouldn't raise an exception at all for (1). This is convenient because:
- Both Item A and Item B might NOT want to extract the field named
extra
. This could be useful if such a field requires an additional request. If the error is raised by theitem_cls_fields=False
default, then it might encourage developers to define theextra
field in the item to make the error go away. They have the option to overrideitem_cls_fields=True
but that's another extra step. It might occur to them that they could just define theextra
field just in case they'd need it sometime in the future (without a proper idea how much it costs to extract that field).
But it would still raise an exception for (2), unless it's defined as zen: Optional[str] = None
.
If we set item_cls_fields=False
as the default, it'd error out on the first run if the items have missing fields like in point (1). For me, this would be okay since it forces users to contemplate if this is indeed the expected behavior or should users switch to item_cls_fields=True
for the specific use case. This is consistent with the points mentioned in https://web-poet.readthedocs.io/en/latest/advanced/fields.html#error-prevention.
Conversely, if we set item_cls_fields=True
, an exception like in point (1) wouldn't be raised at all. Users could miss out on the fact that some field from the PO hasn't been utilized. Most probably, item_cls_fields=False
wouldn't come up to the user's mind since no warning or exceptions were raised at all.
If we're using the POs to fill-up item classes with values, then we need to ensure that it's indeed the expected behavior by explicitly overriding item_cls_fields=True
.
TL;DR My vote is to keep item_cls_fields=False
as the default.
It's released with skip_nonitem_fields as False by default.