frontera
frontera copied to clipboard
WARNING: unable to serialize object: None
Hello, I followed quick start tutorial. Turned on db worker and strategy worker. Then I turned on scrapy using scrapy crawl general -L INFO -s FRONTERA_SETTINGS=frontier.spider_settings -s SEEDS_SOURCE=seeds_es_smp.txt -s SPIDER_PARTITION_ID=0
But the crawler seems not working. It keeps on throwing [frontera.contrib.backends.remote.codecs.msgpack] WARNING: unable to serialize object: None
.
How to fix it?. Thx in advance
Hi Jan,
It's just a warning, it should work fine.
Fix is in progress https://github.com/scrapinghub/frontera/pull/257
On Wed, Mar 8, 2017 at 11:36 AM, Jan Burda [email protected] wrote:
Hello, I followed quick start tutorial. Turned on db worker and strategy worker. Then I turned on scrapy using scrapy crawl general -L INFO -s FRONTERA_SETTINGS=frontier.spider_settings -s SEEDS_SOURCE=seeds_es_smp.txt -s SPIDER_PARTITION_ID=0
But the crawler seems not working. It keeps on throwing [frontera.contrib.backends.remote.codecs.msgpack] WARNING: unable to serialize object: None. How to fix it?. Thx in advance
[image: image] https://cloud.githubusercontent.com/assets/10119003/23700542/62040e9c-03f3-11e7-91f8-033b1114de1e.png
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/scrapinghub/frontera/issues/264, or mute the thread https://github.com/notifications/unsubscribe-auth/AIoAnEIEN6kXhj6WjDnPh9XZiMXaP9ttks5rjoTHgaJpZM4MWn4_ .
@sibiryakov But why the crawler doesnt work then? I give it URLs... And it doesnt crawl anything, I just keep on getting this warning
If you want me to help debugging it, I need all your configs and spider code.
Here's what you can do yourself:
You need to debug the communication between DBW, SW, and spiders. Start with one instance of each component and check the log output. Please make sure, when you add seeds using spider, they are propagated to SW by means of spider log and later DBW consumed scheduling messages from scoring log, and finally DBW was generating new batch to fetch. All that should be possible to examine using console output logs.
Keep in mind, when using ZMQ as a message bus (default), you could loose messages when one of components isn't available (not running). If you start with empty database and seeds, then recommended order of running the components would be SW, DBW, spiders.
@sibiryakov I am using default configs and default spider code... I have just cloned your repository and followed quick start distributed tutorial here -> http://frontera.readthedocs.io/en/latest/topics/quick-start-distributed.html
I did the same, just to make sure and it works for me. Yes, it floods stdout with warnings, but works. If you don't want to see warnings you could try this https://github.com/scrapinghub/frontera/pull/268. I hope it will be merged and released soon.
fwiw, i am also getting the same "it appears everything is working, but it's not" when following the distributed via kafka tutorial... i'll post more if i happen to figure it out
Same issue. I followed "distributed" tutorial.
I need someone who will share some reproducible example with me @desprit, @Bundas @bresmith-wayfair . Unfortunately this software stack (Scrapy, Frontera) allows for too much customization, and even tiny detail could brake everything. In other words, all configs and all source code.
@sibiryakov
I will try to help. I'm building dockerized version of distributed system (kafka, hbase). I have a lot of warnings and what I'm trying to do now is to init frontera with custom classes (I simply copied default MessageBus, Strategy worker class and Hbase class) and debug every single step they do. Hope to resolve my issues within couple days.