scrapy icon indicating copy to clipboard operation
scrapy copied to clipboard

Deprecate scrapy.Item

Open Gallaecio opened this issue 7 months ago • 6 comments

Just thinking outloud.

With itemadapter allowing to have Scrapy items defined as dataclasses, attrs classes or pydantic models, I wonder if it makes sense to keep the scrapy.Item class.

It seems to me that the only thing it adds to the table is reference tracking. I am not sure if that feature is enough to justify its existence. And if we really want that feature in items, we could probably give guidelines to make it work with most other item types, e.g. through some mixin.

And on the cons, I feel like dataclasses, attrs and pydantic offer a nicer syntax with no drawbacks and with better IDE support for certain things.

This mostly comes from the realization that adding type hints to scrapy.Item fields to get those with ItemAdapter.get_json_schema() causes an IDE to complain about type mismatches, since Field instances are not whatever type you want to define. And while you could define the JSON Schema type through additional metadata instead of a type hint, that’s more cumbersome.

Gallaecio avatar Aug 22 '25 11:08 Gallaecio

There are also itemloaders.

wRAR avatar Aug 22 '25 12:08 wRAR

You mean as something to deprecate as well, or as a reason not to deprecate scrapy.Item?

Gallaecio avatar Aug 22 '25 12:08 Gallaecio

I meant the latter, but now I've checked the docs and I see that "Under the hood, itemloaders uses itemadapter as a common interface. This means you can use any of the types supported by itemadapter here.". So it's not a feature specific to scrapy.Item.

I wonder what should we suggest as the easiest way to remove the deprecation warning. Should it be dataclasses?

wRAR avatar Aug 22 '25 14:08 wRAR

I wonder what should we suggest as the easiest way to remove the deprecation warning. Should it be dataclasses?

If we go ahead with this, I don’t think we need to recommend a specific replacement. We can let the user figure out which one fits their needs better. I do think dataclasses is a natural replacement, though.

Gallaecio avatar Aug 22 '25 14:08 Gallaecio

I just think that there is a ton of existing item classes in the wild and it would be nice to have a straightforward guide for replacing them.

wRAR avatar Aug 22 '25 17:08 wRAR

We've discussed this with @kmike, points:

  1. Items have a benefit over dataclasses by having all fields optional by default. Though dicts also have this.
  2. @kmike thinks we cannot remove scrapy.Item as everyone uses it. In that case we shouldn't deprecate it either.
  3. We should make sure our docs talk about itemadapter instead of "hardcoding" Items and dicts as old docs would do.
  4. If we want to promote dataclasses over Items we should make sure our docs and tutorials don't use the latter (though simpler ones use dicts, I guess we should keep those).
  5. When I've tried to assess or make a PoC change for this, I've found that many tests use Items, we can move those to dataclasses if we want. Many tests also use Items, dicts and dataclasses, we can either keep those or refactor them so that dataclasses are the "primary" thing (this should be purely code organization, no coverage change).

wRAR avatar Nov 27 '25 15:11 wRAR