scrapy-rabbitmq
scrapy-rabbitmq copied to clipboard
Connection parameters not working
I have given the following in my scrapy settings.py file RABBITMQ_CONNECTION_PARAMETERS = {'host': 'amqp://username:password@rabbitmqserver', 'port':5672} But I am getting the following error: raise exceptions.AMQPConnectionError(error) pika.exceptions.AMQPConnectionError: [Errno 11003] getaddrinfo failed
How can I use rabbit MQ server with my credentials?
I doubt these settings will work in this library. Try passing it pika.credentials.Credentials object. That is how it expects in connection.py
I am not sure how can I pass it through settings.py file. Can you please assist how can I give that pika.credentials.Credentials object in settings.py file?
I have connected to my RabbitMQ using pika.credentials.Credentials object. But I am receiving the following error ''' pika.exceptions.ChannelClosed: (404, "NOT_FOUND - no queue 'multidomain:requests' in vhost '/'") ''' Any suggestion for this?
Can you create the queue manually and give it a try?
I have created a queue 'multidomain' manually in Rabbit MQ and tried, getting the same error.
Do you mean to create the queue from scrapy spider?
The queue is "multidomain:requests".
I tried with queue name as "multidomain:requests" and getting below error in the path "\scrapy_rabbitmq\queue.py" return response.message_count exceptions.AttributeError: 'Method' object has no attribute 'message_count'
It seems that scheduler is not working as expected.
Is there any fix for this?
Try my fork. I've fixed these issues.
I have worked with your fork (rdcprojects/scrapy-rabbitmq) and I run my scrapy spider. For testing my script, I have just crawled a field from a URL and print that.
I am getting the following error " cPickle.BadPickleGet: 116"
Is there anything I have to do with my scrapy spider?
Can you provide full traceback?
2015-10-21 15:21:50+0530 [multidomain] INFO: Spider opened 2015-10-21 15:21:50+0530 [multidomain] DEBUG: Resuming crawl (1 request s scheduled) 2015-10-21 15:21:50+0530 [multidomain] INFO: Crawled 0 pages (at 0 page s/min), scraped 0 items (at 0 items/min) 2015-10-21 15:21:50+0530 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6 023 2015-10-21 15:21:50+0530 [scrapy] DEBUG: Web service listening on 127.0.0.1:6080
2015-10-21 15:21:50+0530 [-] Unhandled Error
Traceback (most recent call last):
File "C:\Python27\lib\site-packages\scrapy\crawler.py", line 93, in st
art
self.start_reactor()
File "C:\Python27\lib\site-packages\scrapy\crawler.py", line 130, in s
tart_reactor
reactor.run(installSignalHandlers=False) # blocking call
File "C:\Python27\lib\site-packages\twisted\internet\base.py", line 11
92, in run
self.mainLoop()
File "C:\Python27\lib\site-packages\twisted\internet\base.py", line 12
01, in mainLoop
self.runUntilCurrent()
---
cPickle.BadPickleGet: 116
I think we'll have to dig deeper into the library to make it work. You can get in touch with the folks at IRC channel if you want to continue working on the library. Hope this helps!!
Thanks for the information. Please confirm that whether the URLs in RabbitMQ queue should be in specific format(i.e. The message in the should be like "http://www.domain.com/query" or ["http://www.domain.com/query"] or http://www.domain.com/query
Because I just want to confirm that there should not be any issues in RabbitMQ queue.
You can check the scrapy documentation about how URLs are stored in requests queue. There's some encoding / serialization being used. I'm not completely sure about it.