scrapy icon indicating copy to clipboard operation
scrapy copied to clipboard

Small improvement to the docs for setting ITEM_PIPELINES

Open intotecho opened this issue 6 years ago • 4 comments

In the docs

https://github.com/scrapy/scrapy/blob/65d631329a1434ec013f24341e4b8520241aec70/scrapy/templates/project/module/pipelines.py.tmpl

It says, in the comments:

Define your item pipelines here

Don't forget to add your pipeline to the ITEM_PIPELINES setting See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html

Please change the instruction to:

Don't forget to add your pipeline to the ITEM_PIPELINES setting in settings.py

I added the setting to my spider's init, and it was hard to find out what was going wrong. Mentioning settings.py would help others who make the same mistake.

intotecho avatar May 14 '19 06:05 intotecho

The thing is, settings.py is just one of the places where it can be defined, and I think mentioning all places where you can define settings everywhere we mention a setting in the documentation would make things too verbose.

I understand your frustration, but I’m not sure how we can improve things. Users are expected to have read https://docs.scrapy.org/en/latest/topics/settings.html#populating-the-settings by the time they look up specific settings in the documentation.

Gallaecio avatar May 14 '19 07:05 Gallaecio

Out of curiosity, why does an Item need to be declared in ITEM_PIPELINES in order to be processed?

I'm learning scrapy and this just bit me -- I was yielding an Item subclass with a process_item method, but process_item wasn't called until I added my class to ITEM_PIPELINES.

This was counterintuitive to me as a learner -- is there a reason someone would yield an Item without wanting its process method to be called?

Related issue for the pipeline docs: #2350

I kind of agree with 2350 -- I'm an experienced python programmer, but it took me a while to figure out the item pipeline from docs. I couldn't find a complete example -- the entire 'item pipelines' docs page, for example, doesn't have the yield keyword anywhere. A small self-contained example (which includes the ITEM_PIPELINES reminder) would have helped a lot.

Happy to submit a (small) docs PR if helpful, but fair warning I'm not a scrapy expert.

abe-winter avatar Jan 26 '23 23:01 abe-winter

I'm learning scrapy and this just bit me -- I was yielding an Item subclass with a process_item method, but process_item wasn't called until I added my class to ITEM_PIPELINES.

This was counterintuitive to me as a learner -- is there a reason someone would yield an Item without wanting its process method to be called?

This is the first time I hear of someone defining a process_item method on an item class itself. Item pipeline classes are intended to be separate from item classes, it is not customary to use an item class also as an item pipeline.

Gallaecio avatar Jan 27 '23 12:01 Gallaecio

ahhh that makes sense -- I misunderstood the API here

fwiw it would really help to add an end-to-end example in the 'item pipelines' docs page

one that included yielding from a spider

abe-winter avatar Jan 27 '23 16:01 abe-winter