fbcrawl icon indicating copy to clipboard operation
fbcrawl copied to clipboard

Page scraped, clicking on "more"! ERROR

Open psegovias opened this issue 4 years ago • 3 comments

Today i got this error, it was working fine until yesterday:

2020-04-16 15:55:18 [facebook] INFO: Scraping facebook page https://mbasic.facebook.com/DonaldTrump
2020-04-16 15:55:20 [facebook] INFO: First page scraped, clicking on "more"! new_page = https://mbasic.facebook.com/DonaldTrump?sectionLoadingID=m_timeline_loading_div_1588316399_0_36_timeline_unit%3A1%3A00000000001586987412%3A04611686018427387904%3A09223372036854775803%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001586987412%3A04611686018427387904%3A09223372036854775803%3A04611686018427387904&timeend=1588316399&timestart=0&tm=AQB10XmdcwcSeu_r&refid=17
2020-04-16 15:55:21 [facebook] INFO: Page scraped, clicking on "more"! new_page = https://mbasic.facebook.com/DonaldTrump?sectionLoadingID=m_timeline_loading_div_1588316399_0_36_timeline_unit%3A1%3A00000000001586965926%3A04611686018427387904%3A09223372036854775798%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001586965926%3A04611686018427387904%3A09223372036854775798%3A04611686018427387904&timeend=1588316399&timestart=0&tm=AQB10XmdcwcSeu_r&refid=17
2020-04-16 15:55:23 [facebook] INFO: Page scraped, clicking on "more"! new_page = https://mbasic.facebook.com/DonaldTrump?sectionLoadingID=m_timeline_loading_div_1588316399_0_36_timeline_unit%3A1%3A00000000001586903168%3A04611686018427387904%3A09223372036854775793%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001586903168%3A04611686018427387904%3A09223372036854775793%3A04611686018427387904&timeend=1588316399&timestart=0&tm=AQB10XmdcwcSeu_r&refid=17
2020-04-16 15:55:24 [facebook] INFO: Page scraped, clicking on "more"! new_page = https://mbasic.facebook.com/DonaldTrump?sectionLoadingID=m_timeline_loading_div_1588316399_0_36_timeline_unit%3A1%3A00000000001586874136%3A04611686018427387904%3A09223372036854775788%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001586874136%3A04611686018427387904%3A09223372036854775788%3A04611686018427387904&timeend=1588316399&timestart=0&tm=AQB10XmdcwcSeu_r&refid=17
2020-04-16 15:55:26 [facebook] INFO: Page scraped, clicking on "more"! new_page = https://mbasic.facebook.com/DonaldTrump?sectionLoadingID=m_timeline_loading_div_1588316399_0_36_timeline_unit%3A1%3A00000000001586814589%3A04611686018427387904%3A09223372036854775783%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001586814589%3A04611686018427387904%3A09223372036854775783%3A04611686018427387904&timeend=1588316399&timestart=0&tm=AQB10XmdcwcSeu_r&refid=17
2020-04-16 15:55:27 [facebook] INFO: Page scraped, clicking on "more"! new_page = https://mbasic.facebook.com/DonaldTrump?sectionLoadingID=m_timeline_loading_div_1588316399_0_36_timeline_unit%3A1%3A00000000001586793359%3A04611686018427387904%3A09223372036854775778%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001586793359%3A04611686018427387904%3A09223372036854775778%3A04611686018427387904&timeend=1588316399&timestart=0&tm=AQB10XmdcwcSeu_r&refid=17
2020-04-16 15:55:28 [facebook] INFO: Page scraped, clicking on "more"! new_page = https://mbasic.facebook.com/DonaldTrump?sectionLoadingID=m_timeline_loading_div_1588316399_0_36_timeline_unit%3A1%3A00000000001586740702%3A04611686018427387904%3A09223372036854775773%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001586740702%3A04611686018427387904%3A09223372036854775773%3A04611686018427387904&timeend=1588316399&timestart=0&tm=AQB10XmdcwcSeu_r&refid=17
2020-04-16 15:55:29 [facebook] INFO: Page scraped, clicking on "more"! new_page = https://mbasic.facebook.com/DonaldTrump?sectionLoadingID=m_timeline_loading_div_1588316399_0_36_timeline_unit%3A1%3A00000000001586724308%3A04611686018427387904%3A09223372036854775767%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001586724308%3A04611686018427387904%3A09223372036854775767%3A04611686018427387904&timeend=1588316399&timestart=0&tm=AQB10XmdcwcSeu_r&refid=17
2020-04-16 15:55:31 [facebook] INFO: Page scraped, clicking on "more"! new_page = https://mbasic.facebook.com/DonaldTrump?sectionLoadingID=m_timeline_loading_div_1588316399_0_36_timeline_unit%3A1%3A00000000001586707937%3A04611686018427387904%3A09223372036854775762%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001586707937%3A04611686018427387904%3A09223372036854775762%3A04611686018427387904&timeend=1588316399&timestart=0&tm=AQB10XmdcwcSeu_r&refid=17
2020-04-16 15:55:33 [facebook] INFO: Page scraped, clicking on "more"! new_page = https://mbasic.facebook.com/DonaldTrump?sectionLoadingID=m_timeline_loading_div_1588316399_0_36_timeline_unit%3A1%3A00000000001586652089%3A04611686018427387904%3A09223372036854775757%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001586652089%3A04611686018427387904%3A09223372036854775757%3A04611686018427387904&timeend=1588316399&timestart=0&tm=AQB10XmdcwcSeu_r&refid=17
2020-04-16 15:55:34 [facebook] INFO: Page scraped, clicking on "more"! new_page = https://mbasic.facebook.com/DonaldTrump?sectionLoadingID=m_timeline_loading_div_1588316399_0_36_timeline_unit%3A1%3A00000000001586647937%3A04611686018427387904%3A09223372036854775752%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001586647937%3A04611686018427387904%3A09223372036854775752%3A04611686018427387904&timeend=1588316399&timestart=0&tm=AQB10XmdcwcSeu_r&refid=17
2020-04-16 15:55:36 [facebook] INFO: Page scraped, clicking on "more"! new_page = https://mbasic.facebook.com/DonaldTrump?sectionLoadingID=m_timeline_loading_div_1588316399_0_36_timeline_unit%3A1%3A00000000001586633431%3A04611686018427387904%3A09223372036854775747%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001586633431%3A04611686018427387904%3A09223372036854775747%3A04611686018427387904&timeend=1588316399&timestart=0&tm=AQB10XmdcwcSeu_r&refid=17
2020-04-16 15:55:37 [facebook] INFO: Page scraped, clicking on "more"! new_page = https://mbasic.facebook.com/DonaldTrump?sectionLoadingID=m_timeline_loading_div_1588316399_0_36_timeline_unit%3A1%3A00000000001586606903%3A04611686018427387904%3A09223372036854775742%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001586606903%3A04611686018427387904%3A09223372036854775742%3A04611686018427387904&timeend=1588316399&timestart=0&tm=AQB10XmdcwcSeu_r&refid=17
2020-04-16 15:55:38 [facebook] INFO: Page scraped, clicking on "more"! new_page = https://mbasic.facebook.com/DonaldTrump?sectionLoadingID=m_timeline_loading_div_1588316399_0_36_timeline_unit%3A1%3A00000000001586551847%3A04611686018427387904%3A09223372036854775737%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001586551847%3A04611686018427387904%3A09223372036854775737%3A04611686018427387904&timeend=1588316399&timestart=0&tm=AQB10XmdcwcSeu_r&refid=17
2020-04-16 15:55:40 [facebook] INFO: Page scraped, clicking on "more"! new_page = https://mbasic.facebook.com/DonaldTrump?sectionLoadingID=m_timeline_loading_div_1588316399_0_36_timeline_unit%3A1%3A00000000001586531800%3A04611686018427387904%3A09223372036854775732%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001586531800%3A04611686018427387904%3A09223372036854775732%3A04611686018427387904&timeend=1588316399&timestart=0&tm=AQB10XmdcwcSeu_r&refid=17
2020-04-16 15:55:41 [facebook] INFO: Page scraped, clicking on "more"! new_page = https://mbasic.facebook.com/DonaldTrump?sectionLoadingID=m_timeline_loading_div_1588316399_0_36_timeline_unit%3A1%3A00000000001586526777%3A04611686018427387904%3A09223372036854775727%3A04611686018427387904&unit_cursor=timeline_unit%3A1%3A00000000001586526777%3A04611686018427387904%3A09223372036854775727%3A04611686018427387904&timeend=1588316399&timestart=0&tm=AQB10XmdcwcSeu_r&refid=17
2020-04-16 15:57:11 [facebook] INFO: [!] "more" link not found, will look for a "year" link
2020-04-16 15:57:11 [facebook] INFO: Crawling has finished with no errors!
2020-04-16 15:57:11 [scrapy.core.engine] INFO: Closing spider (finished)
2020-04-16 15:57:11 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 99658,
 'downloader/request_count': 79,
 'downloader/request_method_count/GET': 78,
 'downloader/request_method_count/POST': 1,
 'downloader/response_bytes': 1528479,
 'downloader/response_count': 79,
 'downloader/response_status_count/200': 78,
 'downloader/response_status_count/302': 1,
 'elapsed_time_seconds': 116.639633,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2020, 4, 16, 15, 57, 11, 405248),
 'log_count/INFO': 90,
 'memusage/max': 67563520,
 'memusage/startup': 51879936,
 'request_depth_max': 77,
 'response_received_count': 78,
 'scheduler/dequeued': 79,
 'scheduler/dequeued/memory': 79,
 'scheduler/enqueued': 79,
 'scheduler/enqueued/memory': 79,
 'start_time': datetime.datetime(2020, 4, 16, 15, 55, 14, 765615)}
2020-04-16 15:57:11 [scrapy.core.engine] INFO: Spider closed (finished)

psegovias avatar Apr 16 '20 15:04 psegovias

@psegovias I've faced exactly the same problem. Have you succeded in solving it?

Upd.: In function parse_page in fbcrawler change row for post in response.xpath("//div[contains(@data-ft,'top_level_post_id')]"): to for post in response.xpath("//article[contains(@data-ft,'top_level_post_id')]"):

Nexx0f avatar Apr 27 '20 10:04 Nexx0f

Had the same issue, the fix above worked for me.

lillig avatar May 01 '20 15:05 lillig

are you scraping only public pages, or also private pages you are a member of? cause for the later it doesn't work

natsinger avatar Jun 09 '20 12:06 natsinger