facebook-scraper icon indicating copy to clipboard operation
facebook-scraper copied to clipboard

Facebook Scraper Stops Fetching Posts After 1,140 (should including more than 13,000) with No Next Page URL

Open nobuusa opened this issue 2 years ago • 0 comments

Issue Description: I'm using the Facebook Scraper library to crawl a page that should theoretically have over 13,000 posts. However, after scraping 1,140 posts, the scraper doesn't find a next page URL, and no further posts are retrieved.

Thanks.

- Below is the code I'm using:

import pandas as pd
from facebook_scraper import get_posts
import sys
from time import sleep
from random import randint

i = 0
P = pd.DataFrame()
start = 9881501633339233
end = 2012
fanpage = '100044382131088'
page_default = 40000

for post in get_posts(fanpage, timeout=1200, pages=page_default, cookies="www.facebook.com_cookies.txt", options={"reactors": True,"posts_per_page": 200, "allow_extra_requests": False}):
    print(int(post['post_id']))
    if int(post['post_id']) >= start:
        continue 
    else:
        new_row = {
            'user_id': str(post['user_id']) if post['user_id'] else "",
            'username': str(post['username']) if post['username'] else "",
            'time': post['time'] if post['time'] else 0,
            'post_url': post['post_url'] if post['post_url'] else "",
            'post_id': str(post['post_id']) if post['post_id'] else "",
            'post_text': post['post_text'].strip().replace("\n", "") if post['post_text'] else "",
            'like_count': post['reactions']['讚'] if post['reactions'] and '讚' in post['reactions'] else 0,
            'love_count': post['reactions']['大心'] if post['reactions'] and '大心' in post['reactions'] else 0,
            'go_count': post['reactions']['加油'] if post['reactions'] and '加油' in post['reactions'] else 0,
            'wow_count': post['reactions']['哇'] if post['reactions'] and '哇' in post['reactions'] else 0,
            'haha_count': post['reactions']['哈'] if post['reactions'] and '哈' in post['reactions'] else 0,
            'sad_count': post['reactions']['嗚'] if post['reactions'] and '嗚' in post['reactions'] else 0,
            'angry_count': post['reactions']['怒'] if post['reactions'] and '怒' in post['reactions'] else 0,
            'share_count': post['comments'] if post['comments'] else 0,
            'comment_count': post['shares'] if post['shares'] else 0
        }
        P = pd.concat([P, pd.DataFrame([new_row])], ignore_index=True)
        i = i + 1
        print("\n\t>>>>> DONE{}.....POST_ID: {}  {}\n\t>>>>> {}\n\n".format(i, str(post['post_id']), str(post['time']), str(post['post_url'])))
        sleep(randint(1, 10))

- Below is the code that output when no next page URL is found:

'Looking for next page URL
Requesting page from: https://m.facebook.com/profile/timeline/stream/?cursor=AQHRl8i4R59z2pLAKR1wRhxrbAE7LUS38mNzajqygFXB4Pl0vCAT7RRW9PdIB3NJxvqW4VQMQ8ZyyvcwDrXNzKDJVtXrKp5V80uOLIXkYBoccqnBosJ6fboHkRXM0Mku4hSKOZSUiX62qHYd0ZcD_N8gZ7qCwfF2xsHy1g5PkP-XT6iCDnxLeHeyrfx8RfmaIq8ucDDBtCvhP1lidYOzDv4X66Ofpa_GxsdKQPCWhXu3DvN_Tnyb5-GT8oeoNzRh9M-H_Sunx2E1h39U9_U605xN-Q2b3KXk5oKjjeIS_eSgDJ4PykIuRDdnS28oLngetWhtzf_Xr30razrHpLeCsDAdJOw50vb-ojyvLjn5PUgj4Ynx3psKOy9H7QaJ4gh5wn3SUfBXYO_Sigh33c1OctH9qP1DYNtFZjAzhJuCB1zfUaKD4vlglB44I0LfhLJodi1djMZaSdFFDgBWPPPjxPoJOEukeltGc_zPCOdNq4H8UlW5LKkv14J7iqcotjy7UalvphXFuIonEqvMkqI-GY_82nnsHP9SXBPqEcPaY2xoC_Wy3Nhx9a081Fvcx9qNQXVL608gbkA-o8PtF1IYk_w1jkpPwhIi_TI8hYRU1Ik-yac1lrxkH7eI2H9h43tB7R6AY9GUhnBedg0KA0bJcrCISoNjid-QW1NPggpOqqTy9VVjtUD4yNB2j7FKj1br-O5fzCqWir_OFSYF0agwnGANPS4CU6Ah1uv7YftCfXqwGjNRNXsKN5G2wGTp1PwYjHModCZN2oQeNwyFLf66s9cFtN4HwDKTmv4D0lj3xDoaq_BQRW-FG9MdJtYVE_j0x_3G&start_time=-9223372036854775808&profile_id=100044382131088&replace_id=u_0_0_dP\
Parsing page response
No raw posts (<article> elements) were found in this page.
The page url is: https://m.facebook.com/profile/timeline/stream/?cursor=AQHRl8i4R59z2pLAKR1wRhxrbAE7LUS38mNzajqygFXB4Pl0vCAT7RRW9PdIB3NJxvqW4VQMQ8ZyyvcwDrXNzKDJVtXrKp5V80uOLIXkYBoccqnBosJ6fboHkRXM0Mku4hSKOZSUiX62qHYd0ZcD_N8gZ7qCwfF2xsHy1g5PkP-XT6iCDnxLeHeyrfx8RfmaIq8ucDDBtCvhP1lidYOzDv4X66Ofpa_GxsdKQPCWhXu3DvN_Tnyb5-GT8oeoNzRh9M-H_Sunx2E1h39U9_U605xN-Q2b3KXk5oKjjeIS_eSgDJ4PykIuRDdnS28oLngetWhtzf_Xr30razrHpLeCsDAdJOw50vb-ojyvLjn5PUgj4Ynx3psKOy9H7QaJ4gh5wn3SUfBXYO_Sigh33c1OctH9qP1DYNtFZjAzhJuCB1zfUaKD4vlglB44I0LfhLJodi1djMZaSdFFDgBWPPPjxPoJOEukeltGc_zPCOdNq4H8UlW5LKkv14J7iqcotjy7UalvphXFuIonEqvMkqI-GY_82nnsHP9SXBPqEcPaY2xoC_Wy3Nhx9a081Fvcx9qNQXVL608gbkA-o8PtF1IYk_w1jkpPwhIi_TI8hYRU1Ik-yac1lrxkH7eI2H9h43tB7R6AY9GUhnBedg0KA0bJcrCISoNjid-QW1NPggpOqqTy9VVjtUD4yNB2j7FKj1br-O5fzCqWir_OFSYF0agwnGANPS4CU6Ah1uv7YftCfXqwGjNRNXsKN5G2wGTp1PwYjHModCZN2oQeNwyFLf66s9cFtN4HwDKTmv4D0lj3xDoaq_BQRW-FG9MdJtYVE_j0x_3G&start_time=-9223372036854775808&profile_id=100044382131088&replace_id=u_0_0_dP%5C
The page content is:
+------------------------------------------------------------
+------------------------------------------------------------

Got 0 raw posts from page
Extracting posts from page 114
Looking for next page URL
Page parser did not find next page URL

In short, no further posts are retrieved, but it should including more than 13,000. Thank for helping me to solve it.

nobuusa avatar Oct 02 '23 23:10 nobuusa