facebook_page_scraper icon indicating copy to clipboard operation
facebook_page_scraper copied to clipboard

Get Comments content

Open marcomameli1992 opened this issue 3 years ago • 6 comments

Dear, There is the possibility to get the content of the comments of the posts? Because in your code you only get the number of comment (I have to change the id of the list to 1 instead of 0 in the method that gets this number) but I would like to know if from post is possible to extract the text and reaction of a comment. You have some advice?

marcomameli1992 avatar Feb 02 '21 11:02 marcomameli1992

You want the output of the comment's replies as well?.

If you don't want replies text then you can make a simple request to the post's URL, and the response will contain JavaScript Object inside <script> tag which will contain object for each comment, it almost has all the data you mentioned.

I'm pasting an object for a single comment below.

{
    "node": {
        "id": "Y29tbWVudDo4MjcwMzgzMzQzNzQ3NDNfODg0Nzk0ODQ1MjY1NzU4",
        "legacy_fbid": "884794845265758",
        "author": {
            "__typename": "User",
            "id": "100006874397694",
            "name": "S\u00ea\u00f1haj Aziz Senhajii",
            "__isActor": "User",
            "profile_picture_depth_0": {
                "uri": "https:\/\/scontent-bom1-1.xx.fbcdn.net\/v\/t1.0-1\/cp0\/c5.0.32.32a\/p32x32\/145176629_2803474619891657_7543331979447625897_o.jpg?_nc_cat=101&ccb=2&_nc_sid=7206a8&_nc_ohc=QZQMvNTFI0kAX-gLZfj&_nc_ht=scontent-bom1-1.xx&tp=27&oh=ed36f326be715270d169e14b2df81123&oe=603F00BC"
            },
            "profile_picture_depth_1": {
                "uri": "https:\/\/scontent-bom1-1.xx.fbcdn.net\/v\/t1.0-1\/cp0\/c4.0.24.24a\/p24x24\/145176629_2803474619891657_7543331979447625897_o.jpg?_nc_cat=101&ccb=2&_nc_sid=7206a8&_nc_ohc=QZQMvNTFI0kAX-gLZfj&_nc_ht=scontent-bom1-1.xx&tp=27&oh=45788eb7e6835ac99445d2d1e1705acf&oe=60401989"
            },
            "gender": "MALE",
            "__isEntity": "User",
            "url": "https:\/\/www.facebook.com\/people\/S\u0025C3\u0025AA\u0025C3\u0025B1haj-Aziz-Senhajii\/100006874397694",
            "work_info": null,
            "is_verified": false,
            "short_name": "Senhajii"
        },
        "is_author_weak_reference": false,
        "created_time": 1587155108,
        "spam_display_mode": "none",
        "attachments": [],
        "comment_menu_tooltip": null,
        "should_show_comment_menu": false,
        "private_reply_context": null,
        "feedback": {
            "id": "ZmVlZGJhY2s6ODI3MDM4MzM0Mzc0NzQzXzg4NDc5NDg0NTI2NTc1OA==",
            "page_private_reply": null,
            "viewer_actor": null,
            "viewer_feedback_reaction_info": null,
            "supported_reactions": [
                {
                    "key": 1
                },
                {
                    "key": 2
                },
                {
                    "key": 4
                },
                {
                    "key": 3
                },
                {
                    "key": 7
                },
                {
                    "key": 8
                }
            ],
            "associated_video": null,
            "top_reactions": {
                "edges": [
                    {
                        "reaction_count": 4,
                        "node": {
                            "key": 1,
                            "id": "1635855486666999",
                            "reaction_type": "LIKE"
                        }
                    },
                    {
                        "reaction_count": 1,
                        "node": {
                            "key": 2,
                            "id": "1678524932434102",
                            "reaction_type": "LOVE"
                        }
                    }
                ]
            },
            "reactors": {
                "count": 5,
                "is_empty": false,
                "count_reduced": "\u096b"
            },
            "can_viewer_comment": false,
            "can_viewer_react": false,
            "comment_composer_placeholder": "\u092a\u094d\u0930\u0924\u094d\u0924\u094d\u092f\u0941\u0924\u094d\u0924\u0930 \u0932\u093f\u0939\u093e...",
            "public_conversations_context": {
                "comment_vote_ui_version": "NONE"
            },
            "should_show_top_reactions": true,
            "ask_me_anything_feedback_metadata": null,
            "comment_count": {
                "total_count": 6
            },
            "toplevel_comment_count": {
                "count": 6
            },
            "threading_config": {
                "__typename": "NoThreadingFeedbackConfig"
            },
            "can_viewer_pin_live_comments": false,
            "latest_pinned_comment_event": null,
            "work_answer_event_action_links_comment_renderer": null,
            "subscription_target_id": "884794845265758",
            "display_comments": {
                "highlighted_comments": [],
                "comment_order": "RANKED_REPLIES",
                "expanded_sub_reply_parents": [],
                "is_initially_expanded": false,
                "page_size": 50,
                "reply_comment_order": "RANKED_REPLIES",
                "should_render_composer_preemptively": false,
                "after_count": 6,
                "before_count": 0,
                "count": 6,
                "edges": [],
                "page_info": {
                    "end_cursor": null,
                    "has_next_page": true,
                    "has_previous_page": false,
                    "start_cursor": null
                }
            },
            "associated_group": null
        },
        "upvote_downvote_total": 0,
        "viewer_comment_vote_state": "NONE",
        "work_ama_answer_status": null,
        "page_admin_actor_info": null,
        "is_author_banned_by_content_owner": false,
        "can_viewer_upvote_downvote": false,
        "comment_parent": null,
        "edit_history": {
            "count": 0
        },
        "parent_feedback": {
            "can_viewer_ban_user": false,
            "can_viewer_comment": false,
            "viewer_acts_as_page": null,
            "id": "ZmVlZGJhY2s6ODI3MDM4MzM0Mzc0NzQz",
            "share_fbid": "827038334374743",
            "political_figure_data": null
        },
        "ban_action": "BAN",
        "preferred_body": {
            "__typename": "TextWithEntities",
            "translation_type": "ORIGINAL",
            "delight_ranges": [],
            "image_ranges": [],
            "inline_style_ranges": [],
            "aggregated_ranges": [],
            "ranges": [],
            "color_ranges": [],
            "text": "In have problem in my whatsapp business "
        },
        "translatability_for_viewer": {
            "source_dialect_name": "\u0907\u0902\u0917\u094d\u0930\u091c\u0940"
        },
        "translation_available_for_viewer": false,
        "url": "https:\/\/www.facebook.com\/Whatsappforbusiness\/posts\/827038334374743?comment_id=884794845265758",
        "is_hidden_by_content_owner": null,
        "if_viewer_can_share": null,
        "body_renderer": {
            "__typename": "TextWithEntities",
            "delight_ranges": [],
            "image_ranges": [],
            "inline_style_ranges": [],
            "aggregated_ranges": [],
            "ranges": [],
            "color_ranges": [],
            "text": "In have problem in my whatsapp business ",
            "__module_operation_CometUFICommentBody_comment": {
                "__dr": "CometUFICommentBodyTextWithEntities_textWithEntities$normalization.graphql"
            },
            "__module_component_CometUFICommentBody_comment": {
                "__dr": "CometUFICommentBodyTextWithEntities.react"
            }
        },
        "timestamp_in_video": null,
        "written_while_video_was_live": false,
        "group_comment_info": null,
        "has_constituent_badge": false,
        "can_see_constituent_badge_upsell": false,
        "legacy_token": "827038334374743_884794845265758",
        "question_and_answer_type": null,
        "is_author_original_poster": false,
        "is_viewer_comment_poster": false,
        "is_author_bot": false,
        "is_author_non_coworker": false,
        "author_user_signals_renderer": null,
        "author_badge_renderers": [],
        "identity_badges_web": [],
        "can_show_multiple_identity_badges": false,
        "earned_identity_badges_web": [],
        "can_viewer_disable_preview": false,
        "inline_survey_config": null,
        "attached_story": null,
        "work_answered_event_comment_renderer": null,
        "elevated_comment_data": null,
        "body": {
            "text": "In have problem in my whatsapp business ",
            "ranges": []
        },
        "is_markdown_enabled": false,
        "reply_parent_comment": null,
        "threading_depth": 0,
        "__typename": "Comment"
    },
    "cursor": "AQHR8HGfro-bIrLKBXuaovTl2aow5mWfhIyavvxB3qQwAOyV0sKqR_YsgvBBjw_gOjUmSVU1YCHcKHrGGGQBX6PTyA"
}

.

  • You can use regex to find all the data you need.
  • You can even use bs4

A simple request, response method may not work if comments are over 100 or so as comments are hidden(loads on events only) and injected using JavaScript, so you can use selenium for this, looping and clicking till you find the element that says "View More Comments". And when you reach the bottom of the page, use driver.page_source to extract the entire source code and find the comment's data.


If you want replies as well, you probably will have to open that post's URL with selenium,

  1. Close the "Sign Up" modal
  2. Scroll Down
  3. Check if the comment has a reply, if yes click the replies button continuously till you find "View More"
  4. Extract data from that comment, store it in some data structure(just like I have used python's dictionary on my project)
  5. Repeat, procedure 2-4 until comment exists

There are more different ways as well.

shaikhsajid1111 avatar Feb 02 '21 15:02 shaikhsajid1111

Dear, I'm sorry but I do not understand how to get information from the javascript with beautiful soap. Can you give me some example of code or something to learn how to do it. Thank you.

marcomameli1992 avatar Feb 05 '21 10:02 marcomameli1992

Are you using request or selenium?. Is it must for you to get replies as well?

shaikhsajid1111 avatar Feb 08 '21 13:02 shaikhsajid1111

Hi, i'm also interested to understand how get comments from a post. Can you public the code you used to obtain the object in the second post? I'm making a simple request to the URL post and then using BeautifulSoup, can you help me to get the comment of a particular post?

valerio1805 avatar Mar 08 '21 20:03 valerio1805

@valerio1805 If you're making the request and using that response to get out the comments, you'll only get a few of them.

For e.g, Suppose a post has 1000 comments and you send a request, you will only get around the first 10 comments without their replies as a response because the other 990 comments were never loaded as it is based on javascript's onClick() event.

Facebook uses a framework similar to react.js(only they know which one genuinely) that injects codes via javascript which makes it extremely hard to scrap without browser interaction. So, using BeautifulSoup won't help you get the comments, you have to use frameworks like puppeteer(for javascript) or selenium(for python).

shaikhsajid1111 avatar Mar 09 '21 04:03 shaikhsajid1111

thanks, I will try with selenium to extract comments from a post

valerio1805 avatar Mar 09 '21 06:03 valerio1805