facebook-scraper
facebook-scraper copied to clipboard
missing replies/comment threads
First of all thank you for the awesome code!!
back to my issue -- how can i get replies on comments? i get all commenets under posts, their reactors(haha, wow,..) but not the replies, is there a way to scrap all those comments with nested replies?
from pprint import pprint
from facebook_scraper import *
posts = get_posts('1210214419806423',
pages=1,
extra_info=True,
credentials = ("user", "pw"),
options={"comments": True, "reactors": True, "allow_extra_requests": True, "extra_info": True,"progress":True, "posts_per_page":1, "from_browser": True})
for post in posts:
pprint(post)
put the credentials or cookies.json file (email and password)
put the credentials or cookies.json file (email and password)
i use credentials and even tried it with cookies. - i get post, first level comments but not replies to the comments - no thread
1210214419806423 is a post, not a page. Your invocation of get_posts
is therefore incorrect, you should use the post_urls
argument to signify this is a post. The code:
set_cookies("cookies.json")
post = next(get_posts(post_urls=['1210214419806423'], options={"comments": True}))
print(f"Comments: {post['comments']}, Top level comments: {len(post['comments_full'])}, Replies: {sum(len(c['replies']) for c in post['comments_full'])}")
outputs:
Comments: 108, Top level comments: 9, Replies: 66
for me.
1210214419806423 is a post, not a page. Your invocation of
get_posts
is therefore incorrect, you should use thepost_urls
argument to signify this is a post. The code:set_cookies("cookies.json") post = next(get_posts(post_urls=['1210214419806423'], options={"comments": True})) print(f"Comments: {post['comments']}, Top level comments: {len(post['comments_full'])}, Replies: {sum(len(c['replies']) for c in post['comments_full'])}")
outputs:
Comments: 108, Top level comments: 9, Replies: 66
for me.
it returns different output for me.
Comments: 123, Top level comments: 10, Replies: 0
when i print(post) it returns only comment_text for top level comments(10 comments out of 123), -the way i see it pulls only most relevant comments. is it possible to pull all comments( comment_text) with replies?
it returns different output for me. Comments: 123, Top level comments: 10, Replies: 0
Try enable debug logging as per the issue template, and then post the logs
when i print(post) it returns only comment_text for top level comments(10 comments out of 123),
The comment count is approximately the sum of the top level comments and the replies. Approximately, because some comments get suppressed as spam
-the way i see it pulls only most relevant comments. is it possible to pull all comments( comment_text) with replies?
No, we're limited by the functionality available on m.facebook.com
Thank you, log:
C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\facebook_scraper\facebook_scraper.py:857: UserWarning: Facebook says 'Unsupported Browser'
warnings.warn(f"Facebook says 'Unsupported Browser'")
Got exact timestamp from publish_time: 2022-06-02 08:35:03
Fetching https://m.facebook.com/hoaxPZ/photos/a.317666309061243/1210208423140356/?type=3&source=57&refid=52&__tn__=EH-R
[pfbid023tpAp6bZ14p2bb2sq9GS1kE4zcQMbLKd61noB4AcWe6Sm2op1V6k3qWvKn2R7GJvl] Extract method extract_video didn't return anything
[pfbid023tpAp6bZ14p2bb2sq9GS1kE4zcQMbLKd61noB4AcWe6Sm2op1V6k3qWvKn2R7GJvl] Extract method extract_video_thumbnail didn't return anything
[pfbid023tpAp6bZ14p2bb2sq9GS1kE4zcQMbLKd61noB4AcWe6Sm2op1V6k3qWvKn2R7GJvl] Extract method extract_video_id didn't return anything
[pfbid023tpAp6bZ14p2bb2sq9GS1kE4zcQMbLKd61noB4AcWe6Sm2op1V6k3qWvKn2R7GJvl] Extract method extract_video_meta didn't return anything
[pfbid023tpAp6bZ14p2bb2sq9GS1kE4zcQMbLKd61noB4AcWe6Sm2op1V6k3qWvKn2R7GJvl] Extract method extract_factcheck didn't return anything
[pfbid023tpAp6bZ14p2bb2sq9GS1kE4zcQMbLKd61noB4AcWe6Sm2op1V6k3qWvKn2R7GJvl] Extract method extract_share_information didn't return anything
[pfbid023tpAp6bZ14p2bb2sq9GS1kE4zcQMbLKd61noB4AcWe6Sm2op1V6k3qWvKn2R7GJvl] Extract method extract_listing didn't return anything
[pfbid023tpAp6bZ14p2bb2sq9GS1kE4zcQMbLKd61noB4AcWe6Sm2op1V6k3qWvKn2R7GJvl] Extract method extract_with didn't return anything
Fetching up to 11 comments
Fetching https://m.facebook.com/story.php?story_fbid=pfbid023tpAp6bZ14p2bb2sq9GS1kE4zcQMbLKd61noB4AcWe6Sm2op1V6k3qWvKn2R7GJvl&id=313187412842466&locale=en_US&story_fbid=pfbid023tpAp6bZ14p2bb2sq9GS1kE4zcQMbLKd61noB4AcWe6Sm2op1V6k3qWvKn2R7GJvl&id=313187412842466&p=10&av=100008157765660&eav=AfZ0PJeg3ORKKz71gxK7NMiQC4gpdt7WYHuzi0bIhdG0FF1P2tpgXxZMXycHoLfyMmg&paipv=0&refid=52
No comments found on page
Fetching /comment/replies/?ctoken=1210214419806423_2253965094755184&count=31&curr&pc=1&isinline&initcomp&ft_ent_identifier=pfbid023tpAp6bZ14p2bb2sq9GS1kE4zcQMbLKd61noB4AcWe6Sm2op1V6k3qWvKn2R7GJvl&eav=AfbMRZHDCLXxWLYA4CYbGfZxsEGFz2gncx-d_af9s1tp1izO02Z17Z68UqMB1bV-82U&av=100008157765660&gfid=AQCJWD-akk3sOx6BCy0&refid=52&__tn__=R
Content Not Found
Fetching /comment/replies/?ctoken=1210214419806423_710454313523867&count=26&curr&pc=1&isinline&initcomp&ft_ent_identifier=pfbid023tpAp6bZ14p2bb2sq9GS1kE4zcQMbLKd61noB4AcWe6Sm2op1V6k3qWvKn2R7GJvl&eav=Afaxquv8U2G9nftRfSan71bI2xVWWSAbQqj0e8Jyt01FRBq1SPB-9fucA1hPD9H6v-c&av=100008157765660&gfid=AQBQdWblPffobUQwwus&refid=52&__tn__=R
Content Not Found
Fetching /comment/replies/?ctoken=1210214419806423_411849410806499&count=21&curr&pc=1&isinline&initcomp&ft_ent_identifier=pfbid023tpAp6bZ14p2bb2sq9GS1kE4zcQMbLKd61noB4AcWe6Sm2op1V6k3qWvKn2R7GJvl&eav=AfZTF6ZXB23RoNPJh7m5LOPLViYEQdZyLTY7iiSJLDan-pbcRXDJbdaTjKoxsREVlNY&av=100008157765660&gfid=AQAI8RBj0n1Ej1H0g9M&refid=52&__tn__=R
Content Not Found
Fetching /comment/replies/?ctoken=1210214419806423_762853668206533&count=2&curr&pc=1&isinline&initcomp&ft_ent_identifier=pfbid023tpAp6bZ14p2bb2sq9GS1kE4zcQMbLKd61noB4AcWe6Sm2op1V6k3qWvKn2R7GJvl&eav=AfanRFVvpFvkGhnxQsu_hePDxfvlSvSSrg2wIFGBTfnkzlCEMn72QTyKdvhjHSyICUM&av=100008157765660&gfid=AQAaI1jTw0m5PgBhno0&refid=52&__tn__=R
Content Not Found
Comments: 123, Top level comments: 10, Replies: 0
Do you have a noscript cookie? Try update lxml with pip install -U lxml
been missing lxml - now it seems i retrieve also replies. not sure what you mean by noscript cookie, my cookies looks like this:
[
{
"name": "xxxx",
"value": "",
"domain": ".facebook.com",
"path": "/",
"expires": xxxx,
"httpOnly": true,
"secure": true
},
{
"name": "sb",
"value": "xxxx",
"domain": ".facebook.com",
"path": "/",
"expires": xxxx,
"httpOnly": true,
"secure": true
},
{
"name": "c_user",
"value": "xxxx",
"domain": ".facebook.com",
"path": "/",
"expires": xxxx
"httpOnly": false,
"secure": true
},
{
"name": "wd",
"value": "xxxx",
"domain": ".facebook.com",
"path": "/",
"expires": xxxx,
"httpOnly": false,
"secure": true,
"sameSite": "Lax"
},
{
"name": "xs",
"value": "xxxx",
"domain": ".facebook.com",
"path": "/",
"expires": xxxx,
"httpOnly": true,
"secure": true
},
{
"name": "fr",
"value": "xxxx",
"domain": ".facebook.com",
"path": "/",
"expires": xxxx,
"httpOnly": true,
"secure": true
},
{
"name": "presence",
"value": "xxxx",
"domain": ".facebook.com",
"path": "/",
"expires": -1,
"httpOnly": false,
"secure": true
}
]
been missing lxml - now it seems i retrieve also replies. not sure what you mean by noscript cookie, my cookies looks like this:
Hello, how did you extract your cookies? I've your same issue from a month and i've still not resolved yet using lxml and the snippet posted here. I've the same result you got few post before:
Comments: 123, Top level comments: 10, Replies: 0
@neon-ninja what are your suggestion for correctly extract the cookies?
I've exported with EditThisCookies
and also with Get cookies
(using default settings) on chrome but still not working.
Either should work fine. set_cookies
should raise an exception if your cookies are invalid, so if it doesn't, your cookies are fine
Either should work fine.
set_cookies
should raise an exception if your cookies are invalid, so if it doesn't, your cookies are fine
I'm starting to suspect that there has been some change on fb because as you say my cookies are correct. Could you describe the steps that occur during the extraction of comments? I would like to try to solve the problem.
Sure. Here's the relevant function - extract_comments_full in extractors.py
https://github.com/kevinzg/facebook-scraper/blob/10ad8b47ad15b175bb474311c3c4e7860b6da5de/facebook_scraper/extractors.py#L1143
This function handles identifying the comments area and paginating through comments. For each comment, this function calls the extract_comment_with_replies function:
https://github.com/kevinzg/facebook-scraper/blob/10ad8b47ad15b175bb474311c3c4e7860b6da5de/facebook_scraper/extractors.py#L1120
This function calls the parse_comment function to parse the top level comment:
https://github.com/kevinzg/facebook-scraper/blob/10ad8b47ad15b175bb474311c3c4e7860b6da5de/facebook_scraper/extractors.py#L1008
If there are replies (as detected by this selector - https://github.com/kevinzg/facebook-scraper/blob/10ad8b47ad15b175bb474311c3c4e7860b6da5de/facebook_scraper/extractors.py#L1127), extract_comment_with_replies calls extract_comment_replies:
https://github.com/kevinzg/facebook-scraper/blob/10ad8b47ad15b175bb474311c3c4e7860b6da5de/facebook_scraper/extractors.py#L1097
For each reply, parse_comment is called to parse the reply comment
Hope that helps!