facebook-scraper Fetching posts by post search?

Hi guys, I like the work you have done with this so far. I'm not sure if this is in the pipeline yet, but I would like to enter a search term into Facebook and get back a certain number of posts per page including comments.

For example:

from facebook_scraper import get_posts_by_search, set_cookies

set_cookies("cookies.txt")
search_query = "Mark Zuckerberg"

# generating url search posts request ('https://www.facebook.com/search/posts/?q=Mark%20Zuckerberg')
posts = get_posts_by_search(search_query, pages=3, options={"comments": True, "posts_per_page": 5})

for p in posts:
    pass # get a list of posts as a return

I'm happy to help to work this one out as well.

Aug 03 '21 16:08 exnerfelix

See https://github.com/kevinzg/facebook-scraper/issues/59#issuecomment-830988249 for some related discussion. A pull request for this feature would be welcome

Aug 03 '21 20:08 neon-ninja

I started playing around with loading posts from search results this week and ran into the issue that Facebook is returning the first result immediately as a response and asynchronously loads more results ~1 second later. This results that I'm only getting 1 post per request as there is no option on pagination for m.facebook.com. What is the best way to work with the continuous loading logic from Facebook?

Aug 25 '21 23:08 exnerfelix

The HTML served includes a URL for fetching more results. Search for cursor= to find this see_more_pager URL. If you make a request to that URL, you should get more results, possibly in HTML or JSON format.

Aug 26 '21 00:08 neon-ninja

Sorry, not sure if I can follow... the HTML response from the self.get() method in facebook_scraper.py is response = self.session.get(url=url, **self.requests_kwargs, **kwargs) the URL served is the same one as the initial input URL and does not give me any additional fetching results.

Also, I don't find anything when I search for cursor= as well as for see_more_pager .

Aug 26 '21 02:08 exnerfelix

The cursor URL is in that response. Try log out response.text.

Aug 26 '21 04:08 neon-ninja

I'm getting the same results our of response.text as well. Here is a simplified version of the code:

    # facebook_scraper.py
    def get(self, url, **kwargs):
        try:
            if not url.startswith("http"):
                url = utils.urljoin(FB_MOBILE_BASE_URL, url)
            
            # url is 'https://m.facebook.com/search/posts/?q=nintendo'
            response = self.session.get(url=url, **self.requests_kwargs, **kwargs) # <- returns 1 post
            
            time.sleep(10) # <- waiting before returning to the page content
            # saving the file for easier read and understanding of the response content
            textfile = open("result.html", "w")
            a = textfile.write(response.text) # <- returns the same post
            textfile.close()

            ### MORE CODE BELOW HERE ###

Another thought as to why I may only be getting 1 result back is that the screen size of the request is too small to load additional content, and I somehow need to do a scrolldown event to trigger a reload of the content.

Much appreciate your help as this is a major blocker for me at the moment.

Much appreciate your help on this, this is a big blocker for me right now.

Aug 26 '21 16:08 exnerfelix

Okay, I got the cursor= reference now, which is only visible for me using FB_MBASIC_BASE_URL. So getting multiple pages in is no longer the issue, but the page scraper does no longer work for FB_MBASIC_BASE_URL, I will start trying to work around that now. But would appreciate a comment on why you have FB_MBASIC_BASE_URL as a constant but only using FB_MOBILE_BASE_URL in the current version?

Aug 26 '21 20:08 exnerfelix

Weird, I get cursor= even with m.facebook.com. I have used mbasic a couple times, but found that I can get mbasic content even on m.facebook.com if I set the noscript=1 cookie, with the set_noscript(True) function

Another thought as to why I may only be getting 1 result back is that the screen size of the request is too small to load additional content, and I somehow need to do a scrolldown event to trigger a reload of the content.

I think you need to think a little lower level - we're not using a browser, we don't have a screen size or any way of scrolling. We're making web requests and getting HTML/JSON back.

Aug 26 '21 21:08 neon-ninja

BTW I have seen request headers on my browser when calling facebook. It is sending viewport-width: 1920

Nov 26 '21 20:11 josx

That's probably just for their analytics, I doubt it has any effect on the returned HTML

Nov 28 '21 20:11 neon-ninja

I need to search by hashtags:

www.facebook.com is very convoluted.
m.facebook.com search is not present
mbasic.facebook.com is present and it is easier than www.facebook.com

I have a WIP on that. The only thing right now i have missing is the custom PostExtractor matching mbasic.

Check here: https://github.com/josx/facebook-scraper/commit/e81e5662b085913ad718072925428e42c8f792e7

Any advice is welcomed

Nov 29 '21 20:11 josx

Search is present on m.facebook.com, see https://m.facebook.com/search/posts/?q=search%20query for example. But it seems non-trivial to search for a hashtag, which I think is what you mean.

Nov 29 '21 20:11 neon-ninja

My mistake, but search for hashtags it is not present in m.facebook.com

Compare https://mbasic.facebook.com/hashtag/facebook/ https://facebook.com/hashtag/facebook/ with https://m.facebook.com/hashtag/facebook

Nov 29 '21 20:11 josx

I think i found a way to solve this issue, however I cant push my solution with it's branch. what should I do?

Dec 06 '21 01:12 Ethan353

Fork the project, and submit a pull request

Dec 06 '21 04:12 neon-ninja

I requested with new branch named search_word

Dec 06 '21 06:12 Ethan353

Hi there, have you checked pull request on this issue?

Dec 19 '21 11:12 Ethan353

Merged 👍

Dec 19 '21 23:12 neon-ninja

Could we search for a query in a specific group with this function?

Something like that

from facebook_scraper import get_posts_by_search, set_cookies

set_cookies("cookies.txt")
search_query = "Mark Zuckerberg"

posts = get_posts_by_search(search_query, group=group_id, options={"comments": True})

for p in posts:
    pass # get a list of posts found in this specific group as a return

Mar 04 '22 10:03 gamcoh

This isn't possible on m.facebook.com

Mar 29 '22 23:03 neon-ninja

@neon-ninja well isn't it possible to search for a specific query inside a group by doing it some other way?

Mar 30 '22 07:03 gamcoh

You can fetch all posts in the group and filter to just posts containing your desired text

Mar 30 '22 07:03 neon-ninja

You can fetch all posts in the group and filter to just posts containing your desired text

Yes but what if the group has lots of posts? I can't download all of them and then sort by the match

Mar 30 '22 08:03 gamcoh

Why not? This library is capable of scraping thousands or tens or thousands of posts in mere minutes.

Mar 30 '22 20:03 neon-ninja

hi @neon-ninja , @Ethan353 is it already works to get posts by search ? When I try to run it seems failed to extract the response

Jan 15 '23 18:01 dckkk

Hi @neon-ninja , not sure if this is already implemented. Tried "get_posts_by_search" but did not get any results. Cookies has been passed too.

Feb 08 '23 01:02 ericleong86

facebook-scraper facebook-scraper copied to clipboard

Fetching posts by post search?

facebook-scraper
facebook-scraper copied to clipboard