frontera WARNING: unable to serialize object: None

Hello, I followed quick start tutorial. Turned on db worker and strategy worker. Then I turned on scrapy using scrapy crawl general -L INFO -s FRONTERA_SETTINGS=frontier.spider_settings -s SEEDS_SOURCE=seeds_es_smp.txt -s SPIDER_PARTITION_ID=0

But the crawler seems not working. It keeps on throwing [frontera.contrib.backends.remote.codecs.msgpack] WARNING: unable to serialize object: None. How to fix it?. Thx in advance

Mar 08 '17 10:03 Bundas

Hi Jan,

It's just a warning, it should work fine.

Fix is in progress https://github.com/scrapinghub/frontera/pull/257

On Wed, Mar 8, 2017 at 11:36 AM, Jan Burda [email protected] wrote:

Hello, I followed quick start tutorial. Turned on db worker and strategy worker. Then I turned on scrapy using scrapy crawl general -L INFO -s FRONTERA_SETTINGS=frontier.spider_settings -s SEEDS_SOURCE=seeds_es_smp.txt -s SPIDER_PARTITION_ID=0

But the crawler seems not working. It keeps on throwing [frontera.contrib.backends.remote.codecs.msgpack] WARNING: unable to serialize object: None. How to fix it?. Thx in advance

[image: image] https://cloud.githubusercontent.com/assets/10119003/23700542/62040e9c-03f3-11e7-91f8-033b1114de1e.png

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/scrapinghub/frontera/issues/264, or mute the thread https://github.com/notifications/unsubscribe-auth/AIoAnEIEN6kXhj6WjDnPh9XZiMXaP9ttks5rjoTHgaJpZM4MWn4_ .

Mar 08 '17 11:03 sibiryakov

@sibiryakov But why the crawler doesnt work then? I give it URLs... And it doesnt crawl anything, I just keep on getting this warning

Mar 08 '17 15:03 Bundas

If you want me to help debugging it, I need all your configs and spider code.

Here's what you can do yourself:

You need to debug the communication between DBW, SW, and spiders. Start with one instance of each component and check the log output. Please make sure, when you add seeds using spider, they are propagated to SW by means of spider log and later DBW consumed scheduling messages from scoring log, and finally DBW was generating new batch to fetch. All that should be possible to examine using console output logs.

Keep in mind, when using ZMQ as a message bus (default), you could loose messages when one of components isn't available (not running). If you start with empty database and seeds, then recommended order of running the components would be SW, DBW, spiders.

Mar 08 '17 17:03 sibiryakov

@sibiryakov I am using default configs and default spider code... I have just cloned your repository and followed quick start distributed tutorial here -> http://frontera.readthedocs.io/en/latest/topics/quick-start-distributed.html

Mar 08 '17 19:03 Bundas

I did the same, just to make sure and it works for me. Yes, it floods stdout with warnings, but works. If you don't want to see warnings you could try this https://github.com/scrapinghub/frontera/pull/268. I hope it will be merged and released soon.

Mar 14 '17 16:03 sibiryakov

fwiw, i am also getting the same "it appears everything is working, but it's not" when following the distributed via kafka tutorial... i'll post more if i happen to figure it out

Apr 28 '17 18:04 bresmith-wayfair

Same issue. I followed "distributed" tutorial.

Dec 11 '17 11:12 desprit

I need someone who will share some reproducible example with me @desprit, @Bundas @bresmith-wayfair . Unfortunately this software stack (Scrapy, Frontera) allows for too much customization, and even tiny detail could brake everything. In other words, all configs and all source code.

Dec 11 '17 13:12 sibiryakov

@sibiryakov

I will try to help. I'm building dockerized version of distributed system (kafka, hbase). I have a lot of warnings and what I'm trying to do now is to init frontera with custom classes (I simply copied default MessageBus, Strategy worker class and Hbase class) and debug every single step they do. Hope to resolve my issues within couple days.

Dec 11 '17 17:12 desprit

frontera frontera copied to clipboard

WARNING: unable to serialize object: None

frontera
frontera copied to clipboard