tripadvisor-scraper This is not working right now.

I'm following the instructions on your README.md file.

When I run scrapy crawl tripadvisor-restaurant -o output/result.json -t json, I get the following error:

2016-07-11 17:26:57 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-07-11 17:26:57 [scrapy] DEBUG: Redirecting (301) to <GET https://www.tripadvisor.com/RestaurantSearch?geo=60763&q=New+York+City%2C+New+York&cat=&pid=> from <GET http://www.tripadvisor.com/RestaurantSearch?geo=60763&q=New+York+City%2C+New+York&cat=&pid=>
2016-07-11 17:26:58 [scrapy] DEBUG: Crawled (200) <GET https://www.tripadvisor.com/RestaurantSearch?geo=60763&q=New+York+City%2C+New+York&cat=&pid=> (referer: None)
2016-07-11 17:26:58 [scrapy] ERROR: Spider error processing <GET https://www.tripadvisor.com/RestaurantSearch?geo=60763&q=New+York+City%2C+New+York&cat=&pid=> (referer: None)
Traceback (most recent call last):
  File "/Library/Python/2.7/site-packages/scrapy/utils/defer.py", line 102, in iter_errback
    yield next(it)
  File "/Library/Python/2.7/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
    for x in result:
  File "/Library/Python/2.7/site-packages/scrapy/spidermiddlewares/referer.py", line 22, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/Library/Python/2.7/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/Library/Python/2.7/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/Users/meyyappan/Desktop/tripadvisor-scraper/tripadvisor-scraper/tripadvisorbot/spiders/tripadvisor-restaurant.py", line 41, in parse
    tripadvisor_item['url'] = self.base_uri + clean_parsed_string(get_parsed_string(snode_restaurant, 'div[@class="quality easyClear"]/span/a[@class="property_title "]/@href'))
TypeError: cannot concatenate 'str' and 'NoneType' objects
2016-07-11 17:26:58 [scrapy] INFO: Closing spider (finished)
2016-07-11 17:26:58 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 926,
 'downloader/request_count': 2,
 'downloader/request_method_count/GET': 2,
 'downloader/response_bytes': 62858,
 'downloader/response_count': 2,
 'downloader/response_status_count/200': 1,
 'downloader/response_status_count/301': 1,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2016, 7, 11, 21, 26, 58, 594319),
 'log_count/DEBUG': 3,
 'log_count/ERROR': 1,
 'log_count/INFO': 7,
 'response_received_count': 1,
 'scheduler/dequeued': 2,
 'scheduler/dequeued/memory': 2,
 'scheduler/enqueued': 2,
 'scheduler/enqueued/memory': 2,
 'spider_exceptions/TypeError': 1,
 'start_time': datetime.datetime(2016, 7, 11, 21, 26, 57, 128111)}
2016-07-11 17:26:58 [scrapy] INFO: Spider closed (finished)

Do you know what's wrong? Can you fix it, or explain it and I'll try to fix it. Thanks!

Jul 11 '16 21:07 mnachiappan

yep - would be greatly appreciated, I have no idea how to configure scrapy!

Aug 08 '16 22:08 alzmcr

+1 Did you guys ever get around this?

Mar 09 '17 09:03 jk2227

unfortunately not :(

Mar 09 '17 09:03 alzmcr

Still not working

Mar 28 '17 15:03 engmsaleh

I've the same error

2018-05-20 11:50:49 [scrapy.core.scraper] ERROR: Spider error processing <GET https://www.tripadvisor.com/RestaurantSearch?geo=60763&q=New+York+City%2C+New+York&cat=&pid=> (referer: None)
Traceback (most recent call last):
  File "d:\python27\lib\site-packages\scrapy\utils\defer.py", line 102, in iter_errback
    yield next(it)
  File "d:\python27\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 30, in process_spider_output
    for x in result:
  File "d:\python27\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 339, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "d:\python27\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "d:\python27\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "D:\Download\tripadvisor-scraper\tripadvisorbot\spiders\tripadvisor-restaurant.py", line 42, in parse
    tripadvisor_item['url'] = self.base_uri + clean_parsed_string(get_parsed_string(snode_restaurant, 'div[@class="quality easyClear"]/span/a[@class="property_title "]/@href'))
TypeError: cannot concatenate 'str' and 'NoneType' objects

May 20 '18 09:05 facilus

Its not working now

Jan 28 '19 20:01 harshavardhanm03

I don't know the reason, but it couldn't start scraping with python3.64. It returned the following sentences.

Scrapy 1.7.3 - no active project
Unknown command: crawl
Use "scrapy" to see available commands

Oct 02 '19 13:10 j-takurou

@Jumpo-523 it's compatible with Python 2.x as listed in the README.

Moreover the project would need a revamp to update to the latest versions and fix the parser.

Jun 11 '20 13:06 magic890

tripadvisor-scraper tripadvisor-scraper copied to clipboard

This is not working right now.

tripadvisor-scraper
tripadvisor-scraper copied to clipboard