facebook-scraper
facebook-scraper copied to clipboard
KeyError: 'comments_full' when trying to scrape comments on photo posts
Hello,
I would appreciate any help I can get with a problem I'm experiencing trying to scrape comments on photo posts.
I received the following error message
"Traceback (most recent call last): line 42, in for comment in post["comments_full"]: KeyError: 'comments_full'"
when I ran the following code
from facebook_scraper import *
import pandas as pd
import jsonlines
import os
import time
post_ids = ["https://www.facebook.com/85452072376/posts/10158739652817377"]
cookies = "cookies.txt"
set_cookies("cookies.txt")
options = {"comments": True, "progress": True, "comment_reactors": True}
def format_comment(c):
obj = {
"comment_id": c["comment_id"],
"comment_text": c["comment_text"],
"comment_reaction_count": c["comment_reaction_count"] or 0,
"reply_count": len(c["replies"]) if "replies" in c else 0,
"comment_time": c["comment_time"],
"type": "comment"
}
if c["comment_reactions"]:
obj.update(c["comment_reactions"])
return obj
def format_reply(c):
obj = {
"comment_id": c["comment_id"],
"comment_text": c["comment_text"],
"comment_reaction_count": c["comment_reaction_count"] or 0,
"reply_count": len(c["replies"]) if "replies" in c else 0,
"comment_time": c["comment_time"],
"type": "reply"
}
if c["comment_reactions"]:
obj.update(c["comment_reactions"])
return obj
fb_comments = []
post = next(get_posts(post_urls=post_ids, options=options))
for comment in post["comments_full"]:
fb_comments.append(format_comment(comment))
for reply in comment["replies"]:
fb_comments.append(format_reply(reply))
pd.DataFrame(fb_comments).to_json("fbcomments.jsonl", orient="records", lines=True)
This seems to affect all photo posts, as it happened to the post at https://www.facebook.com/5281959998/posts/10152772906124999 as well as a few other photo posts I've tried.
I'd appreciate any help.
Same happened to me with posts with photos or videos. Could you find any solution?
I haven't found a solution for photo posts, but I've been able able to scrape comments on video posts. I believe this comment/thread references previous issues with scraping comments on video posts and a solution that resolved those issues. You may need to update to the latest version of facebook-scraper if you haven't done so already. I've been able to scrape comments on video posts by entering only the part of the url that starts with watch (for example, "watch/?v=1240637946380306") as the post id.
I tried again to scrape photo posts, and it's working for me now. I'm not sure what changed, but sorry for the false alarm.
Actually, I was mistaken; this is still an issue. Scraping comments seems to work for photo posts if clicking on the date posted gives a URL with a pfbid (e.g., "https://www.facebook.com/washingtonpost/posts/pfbid0S8mnLvJpa3JxEoZsdB8RL1FRPcx47zA9YodMD1qQJorngMCTAU1r7NKM9pq7EiRal"). However, other photo post URLs return only the original request URL and the post URL. All of the following posts are examples of photo posts that return only the original request URL and the post URL:
https://www.facebook.com/85452072376/posts/10158762068092377 https://www.facebook.com/85452072376/posts/10158739652817377 https://www.facebook.com/85452072376/posts/10158697208027377 https://www.facebook.com/85452072376/posts/10158765735462377 https://www.facebook.com/85452072376/posts/10158606605817377 https://www.facebook.com/85452072376/posts/10158664085227377 https://www.facebook.com/85452072376/posts/10158695338337377 https://www.facebook.com/5281959998/posts/10152658141439999 https://www.facebook.com/5281959998/posts/10152772906124999 https://www.facebook.com/5281959998/posts/10152670028269999 https://www.facebook.com/5281959998/posts/10152602913874999 https://www.facebook.com/5281959998/posts/10152702669299999 https://www.facebook.com/5281959998/posts/10152787057339999 https://www.facebook.com/5281959998/posts/10152678807814999 https://www.facebook.com/5281959998/posts/10152759098719999 https://www.facebook.com/5281959998/posts/10152590088004999 https://www.facebook.com/5281959998/posts/10152590235174999 https://www.facebook.com/5281959998/posts/10152672624454999 https://www.facebook.com/5281959998/posts/10152780019704999
These are the URLs that appear after clicking the date posted for the above URLs:
https://www.facebook.com/newsmax/photos/a.10151127234237377/10158762068092377 https://www.facebook.com/newsmax/photos/a.10151127234237377/10158739652817377 https://www.facebook.com/newsmax/photos/a.10151127234237377/10158697208027377 https://www.facebook.com/newsmax/photos/a.10151127234237377/10158765735462377 https://www.facebook.com/newsmax/photos/a.10151127234237377/10158606605817377 https://www.facebook.com/newsmax/photos/a.10151127234237377/10158664085227377 https://www.facebook.com/newsmax/photos/a.10151127234237377/10158695338337377 https://www.facebook.com/nytimes/photos/a.283559809998/10152658141439999 https://www.facebook.com/nytimes/photos/a.283559809998/10152772906124999 https://www.facebook.com/nytimes/photos/a.283559809998/10152670028269999 https://www.facebook.com/nytimes/photos/a.283559809998/10152602913874999 https://www.facebook.com/nytimes/photos/a.283559809998/10152702669299999 https://www.facebook.com/nytimes/photos/a.283559809998/10152787057339999 https://www.facebook.com/nytimes/photos/a.283559809998/10152678807814999 https://www.facebook.com/nytimes/photos/a.283559809998/10152759098719999 https://www.facebook.com/nytimes/photos/a.283559809998/10152590088004999 https://www.facebook.com/nytimes/photos/a.283559809998/10152590235174999 https://www.facebook.com/nytimes/photos/a.283559809998/10152672624454999 https://www.facebook.com/nytimes/photos/a.283559809998/10152780019704999
I've tried using the initial URLs and the URLs that appear after clicking the date posted as post IDs, but they still only return the original request URL and the post URL.
Here is the code I used:
from facebook_scraper import *
import pandas as pd
import jsonlines
post_ids = ["https://www.facebook.com/85452072376/posts/10158762068092377"]
cookies = "cookies.txt"
set_cookies("cookies.txt")
options = {"comments": True, "progress": True, "comment_reactors": True}
def format_comment(c):
obj = {
"comment_id": c["comment_id"],
"commenter_id": c["commenter_id"],
"commenter_name": c["commenter_name"],
"comment_text": c["comment_text"],
"comment_reaction_count": c["comment_reaction_count"] or 0,
"reply_count": len(c["replies"]) if "replies" in c else 0,
"comment_time": c["comment_time"],
"type": "comment"
}
if c["comment_reactions"]:
obj.update(c["comment_reactions"])
return obj
def format_reply(c):
obj = {
"comment_id": c["comment_id"],
"commenter_id": c["commenter_id"],
"commenter_name": c["commenter_name"],
"comment_text": c["comment_text"],
"comment_reaction_count": c["comment_reaction_count"] or 0,
"reply_count": len(c["replies"]) if "replies" in c else 0,
"comment_time": c["comment_time"],
"type": "reply"
}
if c["comment_reactions"]:
obj.update(c["comment_reactions"])
return obj
fb_comments = []
post = next(get_posts(post_urls=post_ids, options=options))
for comment in post["comments_full"]:
fb_comments.append(format_comment(comment))
for reply in comment["replies"]:
fb_comments.append(format_reply(reply))
pd.DataFrame(fb_comments).to_json("fb_comments.jsonl", orient="records", lines=True)
Anyone has made any progress on collecting comments on posts with images and videos? @neon-ninja have you experienced any problems while scraping?