fbcrawl
fbcrawl copied to clipboard
Page scraped, clicking on "more"! ERROR
Today i got this error, it was working fine until yesterday:
2020-04-16 15:55:18 [facebook] INFO: Scraping facebook page https://mbasic.facebook.com/DonaldTrump
2020-04-16 15:55:20 [facebook] INFO: First page scraped, clicking on "more"! new_page = https://mbasic.facebook.com/DonaldTrump?sectionLoadingID=m_timeline_loading_div_1588316399_0_36_timeline_unit%3A1%3A00000000001586987412%3A04611686018427387904%3A09223372036854775803%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001586987412%3A04611686018427387904%3A09223372036854775803%3A04611686018427387904&timeend=1588316399×tart=0&tm=AQB10XmdcwcSeu_r&refid=17
2020-04-16 15:55:21 [facebook] INFO: Page scraped, clicking on "more"! new_page = https://mbasic.facebook.com/DonaldTrump?sectionLoadingID=m_timeline_loading_div_1588316399_0_36_timeline_unit%3A1%3A00000000001586965926%3A04611686018427387904%3A09223372036854775798%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001586965926%3A04611686018427387904%3A09223372036854775798%3A04611686018427387904&timeend=1588316399×tart=0&tm=AQB10XmdcwcSeu_r&refid=17
2020-04-16 15:55:23 [facebook] INFO: Page scraped, clicking on "more"! new_page = https://mbasic.facebook.com/DonaldTrump?sectionLoadingID=m_timeline_loading_div_1588316399_0_36_timeline_unit%3A1%3A00000000001586903168%3A04611686018427387904%3A09223372036854775793%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001586903168%3A04611686018427387904%3A09223372036854775793%3A04611686018427387904&timeend=1588316399×tart=0&tm=AQB10XmdcwcSeu_r&refid=17
2020-04-16 15:55:24 [facebook] INFO: Page scraped, clicking on "more"! new_page = https://mbasic.facebook.com/DonaldTrump?sectionLoadingID=m_timeline_loading_div_1588316399_0_36_timeline_unit%3A1%3A00000000001586874136%3A04611686018427387904%3A09223372036854775788%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001586874136%3A04611686018427387904%3A09223372036854775788%3A04611686018427387904&timeend=1588316399×tart=0&tm=AQB10XmdcwcSeu_r&refid=17
2020-04-16 15:55:26 [facebook] INFO: Page scraped, clicking on "more"! new_page = https://mbasic.facebook.com/DonaldTrump?sectionLoadingID=m_timeline_loading_div_1588316399_0_36_timeline_unit%3A1%3A00000000001586814589%3A04611686018427387904%3A09223372036854775783%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001586814589%3A04611686018427387904%3A09223372036854775783%3A04611686018427387904&timeend=1588316399×tart=0&tm=AQB10XmdcwcSeu_r&refid=17
2020-04-16 15:55:27 [facebook] INFO: Page scraped, clicking on "more"! new_page = https://mbasic.facebook.com/DonaldTrump?sectionLoadingID=m_timeline_loading_div_1588316399_0_36_timeline_unit%3A1%3A00000000001586793359%3A04611686018427387904%3A09223372036854775778%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001586793359%3A04611686018427387904%3A09223372036854775778%3A04611686018427387904&timeend=1588316399×tart=0&tm=AQB10XmdcwcSeu_r&refid=17
2020-04-16 15:55:28 [facebook] INFO: Page scraped, clicking on "more"! new_page = https://mbasic.facebook.com/DonaldTrump?sectionLoadingID=m_timeline_loading_div_1588316399_0_36_timeline_unit%3A1%3A00000000001586740702%3A04611686018427387904%3A09223372036854775773%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001586740702%3A04611686018427387904%3A09223372036854775773%3A04611686018427387904&timeend=1588316399×tart=0&tm=AQB10XmdcwcSeu_r&refid=17
2020-04-16 15:55:29 [facebook] INFO: Page scraped, clicking on "more"! new_page = https://mbasic.facebook.com/DonaldTrump?sectionLoadingID=m_timeline_loading_div_1588316399_0_36_timeline_unit%3A1%3A00000000001586724308%3A04611686018427387904%3A09223372036854775767%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001586724308%3A04611686018427387904%3A09223372036854775767%3A04611686018427387904&timeend=1588316399×tart=0&tm=AQB10XmdcwcSeu_r&refid=17
2020-04-16 15:55:31 [facebook] INFO: Page scraped, clicking on "more"! new_page = https://mbasic.facebook.com/DonaldTrump?sectionLoadingID=m_timeline_loading_div_1588316399_0_36_timeline_unit%3A1%3A00000000001586707937%3A04611686018427387904%3A09223372036854775762%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001586707937%3A04611686018427387904%3A09223372036854775762%3A04611686018427387904&timeend=1588316399×tart=0&tm=AQB10XmdcwcSeu_r&refid=17
2020-04-16 15:55:33 [facebook] INFO: Page scraped, clicking on "more"! new_page = https://mbasic.facebook.com/DonaldTrump?sectionLoadingID=m_timeline_loading_div_1588316399_0_36_timeline_unit%3A1%3A00000000001586652089%3A04611686018427387904%3A09223372036854775757%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001586652089%3A04611686018427387904%3A09223372036854775757%3A04611686018427387904&timeend=1588316399×tart=0&tm=AQB10XmdcwcSeu_r&refid=17
2020-04-16 15:55:34 [facebook] INFO: Page scraped, clicking on "more"! new_page = https://mbasic.facebook.com/DonaldTrump?sectionLoadingID=m_timeline_loading_div_1588316399_0_36_timeline_unit%3A1%3A00000000001586647937%3A04611686018427387904%3A09223372036854775752%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001586647937%3A04611686018427387904%3A09223372036854775752%3A04611686018427387904&timeend=1588316399×tart=0&tm=AQB10XmdcwcSeu_r&refid=17
2020-04-16 15:55:36 [facebook] INFO: Page scraped, clicking on "more"! new_page = https://mbasic.facebook.com/DonaldTrump?sectionLoadingID=m_timeline_loading_div_1588316399_0_36_timeline_unit%3A1%3A00000000001586633431%3A04611686018427387904%3A09223372036854775747%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001586633431%3A04611686018427387904%3A09223372036854775747%3A04611686018427387904&timeend=1588316399×tart=0&tm=AQB10XmdcwcSeu_r&refid=17
2020-04-16 15:55:37 [facebook] INFO: Page scraped, clicking on "more"! new_page = https://mbasic.facebook.com/DonaldTrump?sectionLoadingID=m_timeline_loading_div_1588316399_0_36_timeline_unit%3A1%3A00000000001586606903%3A04611686018427387904%3A09223372036854775742%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001586606903%3A04611686018427387904%3A09223372036854775742%3A04611686018427387904&timeend=1588316399×tart=0&tm=AQB10XmdcwcSeu_r&refid=17
2020-04-16 15:55:38 [facebook] INFO: Page scraped, clicking on "more"! new_page = https://mbasic.facebook.com/DonaldTrump?sectionLoadingID=m_timeline_loading_div_1588316399_0_36_timeline_unit%3A1%3A00000000001586551847%3A04611686018427387904%3A09223372036854775737%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001586551847%3A04611686018427387904%3A09223372036854775737%3A04611686018427387904&timeend=1588316399×tart=0&tm=AQB10XmdcwcSeu_r&refid=17
2020-04-16 15:55:40 [facebook] INFO: Page scraped, clicking on "more"! new_page = https://mbasic.facebook.com/DonaldTrump?sectionLoadingID=m_timeline_loading_div_1588316399_0_36_timeline_unit%3A1%3A00000000001586531800%3A04611686018427387904%3A09223372036854775732%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001586531800%3A04611686018427387904%3A09223372036854775732%3A04611686018427387904&timeend=1588316399×tart=0&tm=AQB10XmdcwcSeu_r&refid=17
2020-04-16 15:55:41 [facebook] INFO: Page scraped, clicking on "more"! new_page = https://mbasic.facebook.com/DonaldTrump?sectionLoadingID=m_timeline_loading_div_1588316399_0_36_timeline_unit%3A1%3A00000000001586526777%3A04611686018427387904%3A09223372036854775727%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001586526777%3A04611686018427387904%3A09223372036854775727%3A04611686018427387904&timeend=1588316399×tart=0&tm=AQB10XmdcwcSeu_r&refid=17
2020-04-16 15:57:11 [facebook] INFO: [!] "more" link not found, will look for a "year" link
2020-04-16 15:57:11 [facebook] INFO: Crawling has finished with no errors!
2020-04-16 15:57:11 [scrapy.core.engine] INFO: Closing spider (finished)
2020-04-16 15:57:11 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 99658,
'downloader/request_count': 79,
'downloader/request_method_count/GET': 78,
'downloader/request_method_count/POST': 1,
'downloader/response_bytes': 1528479,
'downloader/response_count': 79,
'downloader/response_status_count/200': 78,
'downloader/response_status_count/302': 1,
'elapsed_time_seconds': 116.639633,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2020, 4, 16, 15, 57, 11, 405248),
'log_count/INFO': 90,
'memusage/max': 67563520,
'memusage/startup': 51879936,
'request_depth_max': 77,
'response_received_count': 78,
'scheduler/dequeued': 79,
'scheduler/dequeued/memory': 79,
'scheduler/enqueued': 79,
'scheduler/enqueued/memory': 79,
'start_time': datetime.datetime(2020, 4, 16, 15, 55, 14, 765615)}
2020-04-16 15:57:11 [scrapy.core.engine] INFO: Spider closed (finished)
@psegovias I've faced exactly the same problem. Have you succeded in solving it?
Upd.: In function parse_page in fbcrawler change row
for post in response.xpath("//div[contains(@data-ft,'top_level_post_id')]"):
to
for post in response.xpath("//article[contains(@data-ft,'top_level_post_id')]"):
Had the same issue, the fix above worked for me.
are you scraping only public pages, or also private pages you are a member of? cause for the later it doesn't work