facebook-scraper icon indicating copy to clipboard operation
facebook-scraper copied to clipboard

weird get_posts() behavior/bug

Open curiousier-george opened this issue 2 years ago ā€¢ 7 comments

I normally call get_posts() like this:

posts = get_posts(username, cookies=cookie_file, extra_info=True,
                  options={'page_limit': None, 'allow_extra_requests': False, 'HQ_images': False})

But for posts 10215606930220434, 10111743443272349 and 10229044620250382 this doesn't return the correct number of likes/reaction_count (it returns 0) but this:

posts = get_posts(username, cookies=cookie_file, extra_info=True,
                  options={'page_limit': None, 'allow_extra_requests': False, 'reactors': True, 'HQ_images': False})

does. For every other post I've seen, they both return the number of likes/reaction_count properly.

curiousier-george avatar Jul 24 '22 15:07 curiousier-george

This looks to be caused by malformed HTML served by FB, resulting in lxml not putting the footer element in the article element. As a workaround, you can re-fetch these failed posts like so:

set_cookies("cookies.txt")
posts = get_posts(post_urls=[10215606930220434, 10111743443272349, 10229044620250382 ], options={'allow_extra_requests': False})
for post in posts:
    print(post["likes"], post["comments"])

outputs:

74 1
224 36
42 18

neon-ninja avatar Jul 26 '22 02:07 neon-ninja

Thanks.

Now it seems that

posts = get_posts(username, cookies=cookie_file, extra_info=True,
                  options={'page_limit': None, 'allow_extra_requests': False, 'HQ_images': False})

never returns non-0 values for likes/reaction_count. Does this mean Facebook is changing the HTML format overall?

I'm trying to minimize request count, so I'd like to avoid having to get posts one by one.

curiousier-george avatar Jul 26 '22 13:07 curiousier-george

What username are you using? I tried with dudukovich, and still get the error.


set_cookies("cookies.txt")
posts = get_posts("dudukovich", pages=1, options={'allow_extra_requests': False})
for post in posts:
    if not post["likes"]:
        pprint(post)
        break

outputs:

{'available': True,
 'comments': 0,
 'comments_full': None,
 'factcheck': None,
 'image': None,
 'image_id': None,
 'image_ids': [],
 'image_lowquality': None,
 'images': None,
 'images_description': None,
 'images_lowquality': [],
 'images_lowquality_description': [],
 'is_live': False,
 'likes': 0,
 'link': None,
 'links': [],
 'original_text': None,
 'page_id': None,
 'post_id': '10229044620250382',
 'post_text': 'Finally nailed it. My work here is done.\n'
              '\n'
              'Wordle 400 1/6\n'
              '\n'
              'šŸŸ©šŸŸ©šŸŸ©šŸŸ©šŸŸ©',
 'post_url': 'https://facebook.com/dudukovich/posts/10229044620250382',
 'reaction_count': None,
 'reactions': None,
 'reactors': None,
 'shared_post_id': None,
 'shared_post_url': None,
 'shared_text': '',
 'shared_time': None,
 'shared_user_id': None,
 'shared_username': None,
 'sharers': None,
 'shares': 0,
 'text': 'Finally nailed it. My work here is done.\n\nWordle 400 1/6\n\nšŸŸ©šŸŸ©šŸŸ©šŸŸ©šŸŸ©',
 'time': datetime.datetime(2022, 7, 24, 8, 37),
 'timestamp': None,
 'user_id': 1539088457,
 'user_url': 'https://facebook.com/dudukovich?lst=100068943456113%3A1539088457%3A1658786006&refid=17&_ft_=encrypted_tracking_data.0AY_nlA7aEd7s28Fqlm04ViLeX4ILbGh4rrazl3Mj6V2NvD02jBBFJgB4g5JCxg2Wxosvx-eiZpJoDJX_SFTMj-Wy8uHmSNX1PGpNmwlnSknHY1LT3psXmtLY3yOCKRxjyjCzW_7acSga1TPgOsVj8VxoiqLHcQBMpzkx0W3mU1ZxPbN5MlEjKl78LjBUcljP7ioaLcGQ-IHkIJPoJpBNqKHYC8EniRDryYOTsM-DPs2blBJ33x0Q3elahnLlVjxChOVGdunr-31mv5htJHdQpGVod8BRK_gEjNNWFcZ636FJWP4VMs66fgzGtQkYV4Tgr7Vbaju81aoc-zpTvmxSUDIAiILKZYvsV5ldRIeOf-8YSkok2TnVhFq7UkBNmwj1Hew8XdmMDb41iVkmu6ZiRhymDAdilV5JVP8bMcOoWUnOT53WIwE9l_bPK4Twb7cfR9mzLKp55f9sYxS4BDKzh_2cgwjweGfCPoHcMoTbpsPSUx96B2aGRJ3kDvJDKPfspQsfOmP9o6IMquqtUJJogFwtApVtzLqXc6owhZr4s4QD3riTNgchT8zmBbTSWODcvXRDHZm7-CJGWEthfjGQ-SCFqM5q8go6iNhigu8VlzC6_0Sc&__tn__=C-R',
 'username': 'Jim Dudukovich',
 'video': None,
 'video_duration_seconds': None,
 'video_height': None,
 'video_id': None,
 'video_quality': None,
 'video_size_MB': None,
 'video_thumbnail': None,
 'video_watches': None,
 'video_width': None,
 'w3_fb_url': None,
 'was_live': False,
 'with': None}

neon-ninja avatar Jul 26 '22 22:07 neon-ninja

Yes, I do, too - on all usernames now. Until the day I posted the first report in this thread, it worked for all usernames on almost all posts, and now it just doesn't work.

I don't want to get reactors unless I know that there have been new ones because I am trying to minimize requests, so I need to fetch the reaction counts first.

curiousier-george avatar Jul 27 '22 20:07 curiousier-george

I see - try https://github.com/kevinzg/facebook-scraper/commit/c4ffccc681b61372f7bf2d85833ac1873c98ed80 With this commit and this test code:

set_cookies("cookies.txt")
posts = get_posts("dudukovich", pages=1, options={'allow_extra_requests': False})
for post in posts:
    print(post["post_id"], post["likes"], post["comments"])

I get:

10229047171274156 18 2
10229044620250382 42 18
10228996395204786 38 29
10228978022425478 7 3
10228975542723487 23 2
10228952791554722 30 6
10228940462806511 17 2
10228840341583543 13 5
10228778696482454 13 2
10228776189019769 77 19

neon-ninja avatar Jul 28 '22 02:07 neon-ninja

Yes, thanks, this works great.

It doesn't seem to set reaction_count. I saw a related comment of yours recently, but I didn't completely understand it. When does reaction_count get set?

Also, it looks like the pages parameter now works for profiles? Am I remembering correctly that that didn't used to be the case?

Thanks, again.

curiousier-george avatar Jul 28 '22 03:07 curiousier-george

https://github.com/kevinzg/facebook-scraper/commit/40c1e8a6f81d7a89256abaa0811b301875e1a6d8 should set reaction_count. Usually this would only get set if you set options: "reactions", but this would involve an extra request (to something like https://m.facebook.com/ufi/reaction/profile/browser/?ft_ent_identifier=10229047171274156)

I think you're thinking of the posts_per_page parameter

neon-ninja avatar Jul 28 '22 03:07 neon-ninja