scrapy-rabbitmq icon indicating copy to clipboard operation
scrapy-rabbitmq copied to clipboard

Connection parameters not working

Open drprabhakar opened this issue 9 years ago • 14 comments

I have given the following in my scrapy settings.py file RABBITMQ_CONNECTION_PARAMETERS = {'host': 'amqp://username:password@rabbitmqserver', 'port':5672} But I am getting the following error: raise exceptions.AMQPConnectionError(error) pika.exceptions.AMQPConnectionError: [Errno 11003] getaddrinfo failed

How can I use rabbit MQ server with my credentials?

drprabhakar avatar Oct 20 '15 10:10 drprabhakar

I doubt these settings will work in this library. Try passing it pika.credentials.Credentials object. That is how it expects in connection.py

rdcprojects avatar Oct 20 '15 10:10 rdcprojects

I am not sure how can I pass it through settings.py file. Can you please assist how can I give that pika.credentials.Credentials object in settings.py file?

drprabhakar avatar Oct 20 '15 11:10 drprabhakar

I have connected to my RabbitMQ using pika.credentials.Credentials object. But I am receiving the following error ''' pika.exceptions.ChannelClosed: (404, "NOT_FOUND - no queue 'multidomain:requests' in vhost '/'") ''' Any suggestion for this?

drprabhakar avatar Oct 21 '15 06:10 drprabhakar

Can you create the queue manually and give it a try?

rdcprojects avatar Oct 21 '15 06:10 rdcprojects

I have created a queue 'multidomain' manually in Rabbit MQ and tried, getting the same error.

Do you mean to create the queue from scrapy spider?

drprabhakar avatar Oct 21 '15 08:10 drprabhakar

The queue is "multidomain:requests".

rdcprojects avatar Oct 21 '15 08:10 rdcprojects

I tried with queue name as "multidomain:requests" and getting below error in the path "\scrapy_rabbitmq\queue.py" return response.message_count exceptions.AttributeError: 'Method' object has no attribute 'message_count'

It seems that scheduler is not working as expected.

Is there any fix for this?

drprabhakar avatar Oct 21 '15 08:10 drprabhakar

Try my fork. I've fixed these issues.

rdcprojects avatar Oct 21 '15 08:10 rdcprojects

I have worked with your fork (rdcprojects/scrapy-rabbitmq) and I run my scrapy spider. For testing my script, I have just crawled a field from a URL and print that.

I am getting the following error " cPickle.BadPickleGet: 116"

Is there anything I have to do with my scrapy spider?

drprabhakar avatar Oct 21 '15 10:10 drprabhakar

Can you provide full traceback?

rdcprojects avatar Oct 21 '15 10:10 rdcprojects

2015-10-21 15:21:50+0530 [multidomain] INFO: Spider opened 2015-10-21 15:21:50+0530 [multidomain] DEBUG: Resuming crawl (1 request s scheduled) 2015-10-21 15:21:50+0530 [multidomain] INFO: Crawled 0 pages (at 0 page s/min), scraped 0 items (at 0 items/min) 2015-10-21 15:21:50+0530 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6 023 2015-10-21 15:21:50+0530 [scrapy] DEBUG: Web service listening on 127.0.0.1:6080

2015-10-21 15:21:50+0530 [-] Unhandled Error Traceback (most recent call last): File "C:\Python27\lib\site-packages\scrapy\crawler.py", line 93, in st art self.start_reactor() File "C:\Python27\lib\site-packages\scrapy\crawler.py", line 130, in s tart_reactor reactor.run(installSignalHandlers=False) # blocking call File "C:\Python27\lib\site-packages\twisted\internet\base.py", line 11 92, in run self.mainLoop() File "C:\Python27\lib\site-packages\twisted\internet\base.py", line 12 01, in mainLoop self.runUntilCurrent() --- --- File "C:\Python27\lib\site-packages\twisted\internet\base.py", line 82 4, in runUntilCurrent call.func(_call.args, *_call.kw) File "C:\Python27\lib\site-packages\scrapy\utils\reactor.py", line 41, in call return self._func(_self._a, *_self._kw) File "C:\Python27\lib\site-packages\scrapy\core\engine.py", line 107, in _next_request if not self._next_request_from_scheduler(spider): File "C:\Python27\lib\site-packages\scrapy\core\engine.py", line 134, in _next_request_from_scheduler request = slot.scheduler.next_request() File "C:\Python27\lib\site-packages\scrapy_rabbitmq\scheduler.py", lin e 73, in next_request request = self.queue.pop() File "C:\Python27\lib\site-packages\scrapy_rabbitmq\queue.py", line 70 , in pop return self._decode_request(body) File "C:\Python27\lib\site-packages\scrapy_rabbitmq\queue.py", line 29 , in _decode_request return request_from_dict(pickle.loads(encoded_request), self.spider)

    cPickle.BadPickleGet: 116

drprabhakar avatar Oct 21 '15 10:10 drprabhakar

I think we'll have to dig deeper into the library to make it work. You can get in touch with the folks at IRC channel if you want to continue working on the library. Hope this helps!!

rdcprojects avatar Oct 21 '15 11:10 rdcprojects

Thanks for the information. Please confirm that whether the URLs in RabbitMQ queue should be in specific format(i.e. The message in the should be like "http://www.domain.com/query" or ["http://www.domain.com/query"] or http://www.domain.com/query

Because I just want to confirm that there should not be any issues in RabbitMQ queue.

drprabhakar avatar Oct 21 '15 12:10 drprabhakar

You can check the scrapy documentation about how URLs are stored in requests queue. There's some encoding / serialization being used. I'm not completely sure about it.

rdcprojects avatar Oct 21 '15 12:10 rdcprojects