facebook-scraper icon indicating copy to clipboard operation
facebook-scraper copied to clipboard

KeyError: 'comments_full' when trying to scrape comments on photo posts

Open sk1pd1v1d3d opened this issue 2 years ago • 5 comments

Hello,

I would appreciate any help I can get with a problem I'm experiencing trying to scrape comments on photo posts.

I received the following error message

"Traceback (most recent call last): line 42, in for comment in post["comments_full"]: KeyError: 'comments_full'"

when I ran the following code

from facebook_scraper import *
import pandas as pd
import jsonlines
import os
import time

post_ids = ["https://www.facebook.com/85452072376/posts/10158739652817377"]
cookies = "cookies.txt"
set_cookies("cookies.txt")

options = {"comments": True, "progress": True, "comment_reactors": True}

def format_comment(c):
    obj = {
        "comment_id": c["comment_id"],
        "comment_text": c["comment_text"],
        "comment_reaction_count": c["comment_reaction_count"] or 0,
        "reply_count": len(c["replies"]) if "replies" in c else 0,
        "comment_time": c["comment_time"],
        "type": "comment"
    }
    if c["comment_reactions"]:
        obj.update(c["comment_reactions"])
    return obj

def format_reply(c):
    obj = {
        "comment_id": c["comment_id"],
        "comment_text": c["comment_text"],
        "comment_reaction_count": c["comment_reaction_count"] or 0,
        "reply_count": len(c["replies"]) if "replies" in c else 0,
        "comment_time": c["comment_time"],
        "type": "reply"
    }
    if c["comment_reactions"]:
        obj.update(c["comment_reactions"])
    return obj

fb_comments = []
post = next(get_posts(post_urls=post_ids, options=options))

for comment in post["comments_full"]:
    fb_comments.append(format_comment(comment))

    for reply in comment["replies"]:
        fb_comments.append(format_reply(reply))

pd.DataFrame(fb_comments).to_json("fbcomments.jsonl", orient="records", lines=True)

This seems to affect all photo posts, as it happened to the post at https://www.facebook.com/5281959998/posts/10152772906124999 as well as a few other photo posts I've tried.

I'd appreciate any help.

sk1pd1v1d3d avatar Oct 13 '22 21:10 sk1pd1v1d3d

Same happened to me with posts with photos or videos. Could you find any solution?

cihenzi avatar Oct 17 '22 22:10 cihenzi

I haven't found a solution for photo posts, but I've been able able to scrape comments on video posts. I believe this comment/thread references previous issues with scraping comments on video posts and a solution that resolved those issues. You may need to update to the latest version of facebook-scraper if you haven't done so already. I've been able to scrape comments on video posts by entering only the part of the url that starts with watch (for example, "watch/?v=1240637946380306") as the post id.

sk1pd1v1d3d avatar Oct 18 '22 01:10 sk1pd1v1d3d

I tried again to scrape photo posts, and it's working for me now. I'm not sure what changed, but sorry for the false alarm.

sk1pd1v1d3d avatar Oct 20 '22 19:10 sk1pd1v1d3d

Actually, I was mistaken; this is still an issue. Scraping comments seems to work for photo posts if clicking on the date posted gives a URL with a pfbid (e.g., "https://www.facebook.com/washingtonpost/posts/pfbid0S8mnLvJpa3JxEoZsdB8RL1FRPcx47zA9YodMD1qQJorngMCTAU1r7NKM9pq7EiRal"). However, other photo post URLs return only the original request URL and the post URL. All of the following posts are examples of photo posts that return only the original request URL and the post URL:

https://www.facebook.com/85452072376/posts/10158762068092377 https://www.facebook.com/85452072376/posts/10158739652817377 https://www.facebook.com/85452072376/posts/10158697208027377 https://www.facebook.com/85452072376/posts/10158765735462377 https://www.facebook.com/85452072376/posts/10158606605817377 https://www.facebook.com/85452072376/posts/10158664085227377 https://www.facebook.com/85452072376/posts/10158695338337377 https://www.facebook.com/5281959998/posts/10152658141439999 https://www.facebook.com/5281959998/posts/10152772906124999 https://www.facebook.com/5281959998/posts/10152670028269999 https://www.facebook.com/5281959998/posts/10152602913874999 https://www.facebook.com/5281959998/posts/10152702669299999 https://www.facebook.com/5281959998/posts/10152787057339999 https://www.facebook.com/5281959998/posts/10152678807814999 https://www.facebook.com/5281959998/posts/10152759098719999 https://www.facebook.com/5281959998/posts/10152590088004999 https://www.facebook.com/5281959998/posts/10152590235174999 https://www.facebook.com/5281959998/posts/10152672624454999 https://www.facebook.com/5281959998/posts/10152780019704999

These are the URLs that appear after clicking the date posted for the above URLs:

https://www.facebook.com/newsmax/photos/a.10151127234237377/10158762068092377 https://www.facebook.com/newsmax/photos/a.10151127234237377/10158739652817377 https://www.facebook.com/newsmax/photos/a.10151127234237377/10158697208027377 https://www.facebook.com/newsmax/photos/a.10151127234237377/10158765735462377 https://www.facebook.com/newsmax/photos/a.10151127234237377/10158606605817377 https://www.facebook.com/newsmax/photos/a.10151127234237377/10158664085227377 https://www.facebook.com/newsmax/photos/a.10151127234237377/10158695338337377 https://www.facebook.com/nytimes/photos/a.283559809998/10152658141439999 https://www.facebook.com/nytimes/photos/a.283559809998/10152772906124999 https://www.facebook.com/nytimes/photos/a.283559809998/10152670028269999 https://www.facebook.com/nytimes/photos/a.283559809998/10152602913874999 https://www.facebook.com/nytimes/photos/a.283559809998/10152702669299999 https://www.facebook.com/nytimes/photos/a.283559809998/10152787057339999 https://www.facebook.com/nytimes/photos/a.283559809998/10152678807814999 https://www.facebook.com/nytimes/photos/a.283559809998/10152759098719999 https://www.facebook.com/nytimes/photos/a.283559809998/10152590088004999 https://www.facebook.com/nytimes/photos/a.283559809998/10152590235174999 https://www.facebook.com/nytimes/photos/a.283559809998/10152672624454999 https://www.facebook.com/nytimes/photos/a.283559809998/10152780019704999

I've tried using the initial URLs and the URLs that appear after clicking the date posted as post IDs, but they still only return the original request URL and the post URL.

Here is the code I used:

from facebook_scraper import *
import pandas as pd
import jsonlines

post_ids = ["https://www.facebook.com/85452072376/posts/10158762068092377"]
cookies = "cookies.txt"
set_cookies("cookies.txt")

options = {"comments": True, "progress": True, "comment_reactors": True}

def format_comment(c):
    obj = {
        "comment_id": c["comment_id"],
        "commenter_id": c["commenter_id"],
        "commenter_name": c["commenter_name"],
        "comment_text": c["comment_text"],
        "comment_reaction_count": c["comment_reaction_count"] or 0,
        "reply_count": len(c["replies"]) if "replies" in c else 0,
        "comment_time": c["comment_time"],
        "type": "comment"
    }
    if c["comment_reactions"]:
        obj.update(c["comment_reactions"])
    return obj

def format_reply(c):
    obj = {
        "comment_id": c["comment_id"],
        "commenter_id": c["commenter_id"],
        "commenter_name": c["commenter_name"],
        "comment_text": c["comment_text"],
        "comment_reaction_count": c["comment_reaction_count"] or 0,
        "reply_count": len(c["replies"]) if "replies" in c else 0,
        "comment_time": c["comment_time"],
        "type": "reply"
    }
    if c["comment_reactions"]:
        obj.update(c["comment_reactions"])
    return obj

fb_comments = []
post = next(get_posts(post_urls=post_ids, options=options))

for comment in post["comments_full"]:
    fb_comments.append(format_comment(comment))

    for reply in comment["replies"]:
        fb_comments.append(format_reply(reply))

pd.DataFrame(fb_comments).to_json("fb_comments.jsonl", orient="records", lines=True)

sk1pd1v1d3d avatar Oct 22 '22 19:10 sk1pd1v1d3d

Anyone has made any progress on collecting comments on posts with images and videos? @neon-ninja have you experienced any problems while scraping?

ReemOmer avatar Feb 08 '23 11:02 ReemOmer