frontera icon indicating copy to clipboard operation
frontera copied to clipboard

passing `meta` parameters in distributed backends mode for sqlalchemy

Open wetneb opened this issue 9 years ago • 7 comments

Hi, I do not understand how to set meta parameters in a frontier Request generated from a seeder. It seems that there are two kinds of meta parameters: frontier ones and scrapy ones. I would like to set scrapy meta parameters so that my scrapy middlewares get to see them. It seems that they have to be set as meta['scrapy_meta'] = my_scrapy_meta, but when the request arrives in my middleware, these parameters disappear (only the 'frontier_request' argument remains). Any idea where this comes from? Should I translate my middleware to a Frontier middleware (that would work on frontier Requests)? Thanks a lot!

wetneb avatar Jun 20 '16 20:06 wetneb

Seed loaders are Scrapy spider middlewares. All the same rules should apply as to Scrapy middlewares. I need to know your Frontera cluster setup: backends, message bus and run mode to help you.

sibiryakov avatar Jun 21 '16 13:06 sibiryakov

Thanks a lot for your reply! I'm using the distributed setup with ZeroMQ, and the default run mode. I can see that the meta parameters I introduce in the seeder are still available when the requests arrive in the DB and strategy workers.

What is the status of the converters here: https://github.com/scrapinghub/frontera/blob/master/frontera/contrib/scrapy/converters.py Are they involved in the conversion from the frontier request to the scrapy one? If so, when does that happen?

wetneb avatar Jun 21 '16 21:06 wetneb

@wetneb What backend do you use? In case of HBase meta isn't persisted, but in SQLA backend it is. Converters are used in spider processes, and conversion happens all the time when request is read from Frontera and response is returned back.

sibiryakov avatar Jun 22 '16 10:06 sibiryakov

@sibiryakov Thanks! I'm using frontera.contrib.backends.sqlalchemy.Distributed as a backend, so meta is indeed persisted there. I suspect meta disappears during the conversion process in the spider. I will try to debug that.

wetneb avatar Jun 22 '16 14:06 wetneb

Changing the backend to 'frontera.contrib.backends.sqlalchemy.SQLAlchemyBackend' solved the issue indeed. But I needed to keep the Distributed backend for the strategy worker, is that normal? And what is the rationale behind keeping meta in one backend but not the other? Thanks a lot anyway!

wetneb avatar Jun 22 '16 20:06 wetneb

@wetneb oh that's great you found it. https://github.com/scrapinghub/frontera/blob/master/frontera/worker/strategies/init.py#L90 It's not transferred for historical reasons, but it makes sense to do so. PR's are always welcome.

sibiryakov avatar Jun 23 '16 09:06 sibiryakov

Excellent, I'll try to do that then. Thanks a lot!

wetneb avatar Jun 23 '16 13:06 wetneb