facebook-scraper icon indicating copy to clipboard operation
facebook-scraper copied to clipboard

`get_posts_by_search` has no return

Open KKHYA opened this issue 2 years ago • 10 comments

Hello, guys! I'm having trouble using get_posts_by_search to get posts. Here's my code:

import facebook_scraper as fb

fb.set_cookies("cookies.txt")
keywords = "nintendo"

for post in fb.get_posts_by_search(keywords, pages=10, options={"comments": True, "reactors": True, "allow_extra_requests": True}):
    print(post['text'])

It returns no errors but also no posts. Only a few times it can work as expected but it really confuses me. Do you know how to fix it? Very appreciated for your reply!

KKHYA avatar Aug 27 '22 04:08 KKHYA

I am having the same issue. Running get posts_by_search doesn't seem to find any posts. My code is similar to @KKHYA.

from facebook_scraper import get_posts, get_posts_by_search 

for post in get_posts_by_search('cameroon', cookies='cookies2.txt', extra_info=False, pages=10, options={'comments': True, 'posts_per_page':10}):
    print('test')

I hope this isn't a major problem to fix. I find this repo very helpful! Thanks!

wsimpso1 avatar Aug 29 '22 20:08 wsimpso1

Can someone help with this issue? Really don't know why I can't get any posts by this function.

KKHYA avatar Sep 01 '22 03:09 KKHYA

I uninstalled and reinstalled the latest master branch: pip install git+https://github.com/kevinzg/facebook-scraper.git Then I tried to scrape using a keyword I know there must be a lot of tweets for, plus I updated my cookies file.

for post in get_posts_by_search('biden', cookies='cookies3.txt', extra_info=False, pages=5): 
      print(post)

Unfortunately, I still do not get any output from this.

wsimpso1 avatar Sep 01 '22 15:09 wsimpso1

I ran this in Google Colab just in case this was an issue in my environment. I didn't solve this, but I do get an additional warning that is informative at least.

WARNING:facebook_scraper.page_iterators:No raw posts (<article> elements) were found in this page.

It seems that the get_posts_by_search is no longer returning anything. Is anyone else having this issue? I wish I could be more helpful, but I'm still learning about all this. Thank you!

wsimpso1 avatar Sep 01 '22 21:09 wsimpso1

I encountered the same issue, I add these to my code and solved the problem, it's not the solution for a long time, but I still hope it helps.

page_iterators.py

 class SearchPageParser(PageParser):
    cursor_regex = re.compile(r'href[:=]"[^"]+(/search/[^"]+)"')
    cursor_regex_2 = re.compile(r'href":"[^"]+(/search/[^"]+)"')

    def get_page(self) -> Page:                                         //add
          return super()._get_page('article', 'article')    
           
    def get_next_page(self) -> Optional[URL]:
        if self.cursor_blob is not None:
            match = self.cursor_regex.search(self.cursor_blob)
            if match:
                return match.groups()[0]
            match = self.cursor_regex_2.search(self.cursor_blob)
            if match:
                value = match.groups()[0]
                return value.encode('utf-8').decode('unicode_escape').replace('\\/', '/')

liuying12138 avatar Sep 07 '22 07:09 liuying12138

I came across same situation. I modifed source code below, then solved. I suppose that Facebook's site codes changed. Hope it helps.

 class SearchPageParser(PageParser):
    cursor_regex = re.compile(r'href[:=]"[^"]+(/search/[^"]+)"')
    cursor_regex_2 = re.compile(r'href":"[^"]+(/search/[^"]+)"')

    def get_page(self) -> Page:                                         // add
          return super()._get_page('div[data-module-role="TOP_PUBLIC_POSTS"]', 'article')    // add
           
    def get_next_page(self) -> Optional[URL]:
        if self.cursor_blob is not None:
            match = self.cursor_regex.search(self.cursor_blob)
            if match:
                return match.groups()[0]
            match = self.cursor_regex_2.search(self.cursor_blob)
            if match:
                value = match.groups()[0]
                return value.encode('utf-8').decode('unicode_escape').replace('\\/', '/')

yangsu10yen avatar Sep 22 '22 02:09 yangsu10yen

@yangsu10yen It works! Sorry for replying too late. But here's another problem. get_posts_by_search ends after getting 9 posts. Do you know why it happens?

KKHYA avatar Nov 27 '22 18:11 KKHYA

@yangsu10yen It works! Sorry for replying too late. But here's another problem. get_posts_by_search ends after getting 9 posts. Do you know why it happens?

did you update the code and re-installed the package? I still get 0 results even after changing the source code

belhajManel avatar Dec 03 '22 21:12 belhajManel

@yangsu10yen It works! Sorry for replying too late. But here's another problem. get_posts_by_search ends after getting 9 posts. Do you know why it happens?

did you update the code and re-installed the package? I still get 0 results even after changing the source code

@belhajManel Yes, I forked the source code and update it. Then, I installed my updated package and it shows 9 posts. But it still shows only 9 posts now.

KKHYA avatar Jan 12 '23 02:01 KKHYA

@KKHYA Sorry for the delay in replying.

I came across same situation. However, I did not get 9 posts, but no posts.

After investigating the cause, it seems that the tag to find the page container (by PageParse.get_page()) has changed from <article> to <div>.

So applying following changes, we were able to get the posts successfully. Hope this help you.

index cbe0b59..2cead68 100644
--- a/facebook_scraper/page_iterators.py
+++ b/facebook_scraper/page_iterators.py
@@ -143,7 +143,7 @@ class PageParser:

     def get_page(self) -> Page:
         # Select only elements that have the data-ft attribute
-        return self._get_page('article[data-ft*="top_level_post_id"]', 'article')
+        return self._get_page('article[data-ft*="top_level_post_id"], div[data-ft*="top_level_post_id"]', 'article')

     def get_raw_page(self) -> RawPage:
         return self.html

yangsu10yen avatar Mar 26 '23 06:03 yangsu10yen