facebook-scraper
facebook-scraper copied to clipboard
reactors no longer being returned
Just today, reactors stopped being returned for me. The following program exhibits the problem.
from facebook_scraper import get_posts, set_user_agent
from pprint import pprint
import sys
cookie_file = 'facebook_cookies.txt'
set_user_agent("Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)")
post_ids = sys.argv[1 : ]
for post_id in post_ids:
post = next(get_posts(post_urls=[post_id], cookies=cookie_file,
options={'allow_extra_requests': False, 'reactors': True}))
pprint(post)
When I invoke this as
python reactors.py 10158741881073601
the reactors
field returned is None
, although there are actually reactors to the post. It was working until today.
I merged https://github.com/kevinzg/facebook-scraper/pull/707 into master branch, and that fixed reactor extraction for me for this post. Give latest master branch a try and see how you go.
Hmm, no, that didn't work for me. What's really weird, though, is that I had made the changes in #707 in my own copy of facebook_scraper, and that did work for me - at least for links and names, although not types.
Maybe you still had an old version of the library. Try pip uninstall facebook-scraper
twice before running pip install git+https://github.com/kevinzg/facebook-scraper.git
Sorry, I've encountered the same problem.
I've tried pip uninstall facebook-scraper
twice and then pip install git+https://github.com/kevinzg/facebook-scraper.git
, then I copied the code George (who raised this issue) posted and made some revision, such as
from facebook_scraper import get_posts, set_user_agent
from pprint import pprint
cookies=MY_COOKIES
set_user_agent("Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)")
post_ids = ["10158741881073601"]
for post_id in post_ids:
post = next(
get_posts(
post_urls=[post_id],
cookies=cookies,
options={
'allow_extra_requests': False,
'reactors': True
}
)
)
pprint(post)
and I got the return like this (I only showed part of it):
'page_id': None,
'post_id': '10158741881073601',
'post_text': 'Penne Smith Sandbeck',
'post_url': 'https://m.facebook.com/10158741881073601',
'reaction_count': 14,
'reactions': {'care': 3, 'like': 8, 'love': 1, 'sad': 2},
'reactors': [],
'shared_post_id': None,
'shared_post_url': None,
'shared_text': None,
The reactors field is an empty list, but it seems that actually there are some reactors on this post.
I'm not sure if I found the problem.
It seems that it'll raise an exception during this line
k = str(demjson.decode(sigil.attrs.get("data-store"))["reactionType"])
I added a breakpoint()
before this line, and checked for demjson.decode(sigil.attrs.get("data-store"))
,
it returned like this {'reactionID': 478547315650144}
, which didn't contain the key, reactionType
I see - try https://github.com/kevinzg/facebook-scraper/commit/5539ec467223286d952c5af5c19d89a8cffedb17
With this commit I get:
'reaction_count': 14,
'reactions': {'care': 3, 'like': 8, 'love': 1, 'sad': 2},
'reactors': [{'link': 'https://facebook.com/lynnswisher.spears?fref=pb',
'name': 'Lynn Swisher Spears',
'type': 'care'},
{'link': 'https://facebook.com/profile.php?id=100001509791215&fref=pb',
'name': 'D.j. Bost',
'type': 'like'},
{'link': 'https://facebook.com/audra.halemaddox?fref=pb',
'name': 'Audra Hale-Maddox',
'type': 'care'},
{'link': 'https://facebook.com/lin.stogner?fref=pb',
'name': 'Lin Stogner',
'type': 'like'},
{'link': 'https://facebook.com/pam.morris.73?fref=pb',
'name': 'Pam Morris',
'type': 'sad'},
{'link': 'https://facebook.com/shane.petersen.507?fref=pb',
'name': 'Shane Petersen',
'type': 'like'},
{'link': 'https://facebook.com/jeroen.vandenhurk?fref=pb',
'name': 'Jeroen van den Hurk',
'type': 'sad'},
{'link': 'https://facebook.com/judy.e.woodall?fref=pb',
'name': 'Judy Edwards Woodall',
'type': 'like'},
{'link': 'https://facebook.com/kari.tgeorge?fref=pb',
'name': 'Kari Turcogeorge',
'type': 'love'},
{'link': 'https://facebook.com/susan.r.briley?fref=pb',
'name': 'Susan Reesman Briley',
'type': 'like'},
{'link': 'https://facebook.com/hutson.nick?fref=pb',
'name': 'Nick Hutson',
'type': 'like'},
{'link': 'https://facebook.com/jeffrey.harris.1441?fref=pb',
'name': 'Jeffrey Harris',
'type': 'care'},
{'link': 'https://facebook.com/holden.richards?fref=pb',
'name': 'Holden Richards',
'type': 'like'},
{'link': 'https://facebook.com/darrell.e.cook?fref=pb',
'name': 'Darrell E. Cook',
'type': 'like'}],
I apologize for my ignorance, but will
pip install git+https://github.com/kevinzg/facebook-scraper.git
install that commit?
It should do, yes
I've also just pushed a new version (0.2.55) to PyPI, so pip install -U facebook-scraper
would now do it too
Yes, works great! Thank you!
Thank you so much! It works great in that post, but it doesn't work in some posts, such as 5561190327250419. In some cases, it only returns one or two reactors, or even 0.
I see - try https://github.com/kevinzg/facebook-scraper/commit/c41e14e1c8271ae82d2e981d64bf8cd21db08a85
Yes! It works great now. Thank you so much!
Sorry, I encountered another problem about reactors. In some specific facebook fanpage, I can't get correct post_id and reactors, but it works for others. Below is my testing code,
from facebook_scraper import get_posts
from pprint import pprint
cookies = {
"wd": "XXX",
"datr": "XXX",
"sb": "XXX",
"c_user": "XXX",
"xs": "XXX",
"fr": "XXX",
}
posts = get_posts(
post_urls=["https://facebook.com/story.php?story_fbid=524560755699898&id=100044379341462"],
options={
"allow_extra_requests": False,
"comments": "generator",
"reactors": True,
"reactions": True,
"comment_reactors": False,
},
cookies=cookies,
)
post = next(posts)
pprint(post)
Here is part of the return. The post_id seems to be sourced from the first comment instead of the post itself, and the same problem is found in reactions and reactors fields as well.
'page_id': '1536864699976440',
'post_id': '524560755699898_524611979028109',
'post_text': '同學、學長、學妹傳來的照片\n'
'阿金的書在吉隆坡 IPC, 新山 Mid Valley, 新加坡 Popular 目前都有展示\n'
'\n'
'IPC 還是「海景第一排」呢!\n'
'和新馬的朋友分享~',
'post_url': 'https://facebook.com/story.php?story_fbid=524560755699898&id=100044379341462',
'reaction_count': 1,
'reactions': {'like': 1},
'reactors': [{'link': 'https://facebook.com/icudoctor?fref=pb',
'name': 'Icu醫生陳志金',
'type': 'like'}],
I found some discussions talking about cookies, and here is the return after I added "noscript": "1"
in my cookies,
'page_id': '1536864699976440',
'post_id': '524560755699898',
'post_text': '同學、學長、學妹傳來的照片\n'
'阿金的書在吉隆坡 IPC, 新山 Mid Valley, 新加坡 Popular 目前都有展示\n'
'\n'
'IPC 還是「海景第一排」呢!\n'
'和新馬的朋友分享~',
'post_url': 'https://facebook.com/story.php?story_fbid=524560755699898&id=100044379341462',
'reaction_count': None,
'reactions': None,
'reactors': None,
the post_id is correct now, but the reactors and reactions fields turned to be None
.
Here's the output I get with your test code:
'page_id': '1536864699976440',
'post_id': 524560755699898,
'post_text': '同學、學長、學妹傳來的照片\n'
'阿金的書在吉隆坡 IPC, 新山 Mid Valley, 新加坡 Popular 目前都有展示\n'
'\n'
'IPC 還是「海景第一排」呢!\n'
'和新馬的朋友分享~',
'post_url': 'https://facebook.com/story.php?story_fbid=524560755699898&id=100044379341462',
'reaction_count': 568,
'reactions': {'care': 1, 'like': 563, 'love': 2, 'wow': 2},
'reactors': [{'link': 'https://facebook.com/profile.php?id=100080093550026&fref=pb',
'name': 'Leo Hsu',
'type': 'like'},
{'link': 'https://facebook.com/profile.php?id=100077404213696&fref=pb',
'name': '林幸君',
'type': 'like'},
Try update to latest master branch, and try set your Facebook language to English. Also try:
from facebook_scraper import _scraper
with open("524560755699898.html", "w") as f:
f.write(_scraper.get("524560755699898").html.html)
and upload the resulting HTML file