facebook-scraper
facebook-scraper copied to clipboard
[Question] A library to search for facebook groups with given keywords
2 years ago this was possible with Facebook Graph API, which is sadly deprecated now, is there a library you know of, that is capable of doing this?
Why not use Facebook's web interface to do this search? For example:
https://www.facebook.com/search/groups/?q=games or https://m.facebook.com/search/groups/?q=games&source=filter&isTrending=0&tsid=0.44263932103561143
Yeah, sounds good, your lib would be a good base to start, as we need to be logged in to search for groups afaik. Would be sufficient being able to provide keywords for starters and return the group URLs. - If you create the request, that returns the HTML which in includes the URLs I could write the part to scrape them with beautifulsoup for instance.
I meant, why not do it manually in your browser? Do you have hundreds of search terms or something?
Yeah, I have 400 keywords, that I need to find the facebook groups for. : )
Related issue: https://github.com/kevinzg/facebook-scraper/issues/419
I believe this does what is requested. It adds a method get_groups_by_search
which searches for groups, finds their id, and yields the result of get_group_info
with that group_id.
from facebook_scraper import FacebookScraper, utils, get_group_info
from facebook_scraper.constants import FB_MOBILE_BASE_URL
class FacebookScraper(FacebookScraper):
def get_groups_by_search(self, word: str, **kwargs):
"""Searches Facebook groups and yields ids for each result
on the first page"""
group_search_url = utils.urljoin(FB_MOBILE_BASE_URL, f"search/groups/?q={word}")
r = self.get(group_search_url)
for group_element in r.html.find('div[role="button"]'):
button_id = group_element.attrs["id"]
group_id = find_group_id(button_id, r.text)
yield get_group_info(group_id)
def find_group_id(button_id, raw_html):
"""Each group button has an id, which appears later in the script
tag followed by the group id."""
s = raw_html[raw_html.rfind(button_id) :]
group_id = s[s.find("result_id:") :].split(",")[0].split(":")[1]
return int(group_id)
scraper = FacebookScraper()
scraper.login(email=EMAIL, password=PWD)
for group_info in scraper.get_groups_by_search("coffee"):
print(group_info)
Result:
{'id': '1996185023800606', 'name': 'Coffee lovers', 'type': 'Public group', 'members': 14299}
{'id': '2204925119', 'name': 'COFFEE COFFEE COFFEE!!!', 'type': 'Public group', 'members': 340455}
{'id': '755007758392142', 'name': 'LATTE ART', 'type': 'Public group', 'members': 46079}
{'id': '534483107108037', 'name': 'BARISTA COMMUNITY', 'type': 'Public group', 'members': 169960}
{'id': '721633338172381', 'name': 'Funny Coffee Memes', 'type': 'Public group', 'members': 219281}
{'id': '587751572609633', 'name': 'Coffee ☕ & Rain 🌧', 'type': 'Public group', 'members': 116986}
{'id': '823558245059998', 'name': '林芊妤 Coffee 粉絲群組', 'type': 'Public group', 'members': 7932}
{'id': '1574636316089193', 'name': 'I Love Coffee', 'type': 'Public group', 'members': 208646}
{'id': '120661273275592', 'name': 'Coffee & Cake Lovers 💏 ☕🍰', 'type': 'Public group', 'members': 40836}
{'id': '359032028835121', 'name': 'Coffee ☕❤', 'type': 'Public group', 'members': 21074}
{'id': '364701647546998', 'name': 'COFFEE BEANS MARKET', 'type': 'Public group', 'members': 56691}
{'id': '746157059433578', 'name': 'Coffee Everyday', 'type': 'Public group', 'members': 6113}
Great - could you please submit a pull request?
Tôi tin rằng điều này làm những gì được yêu cầu. Nó thêm một phương thức
get_groups_by_search
tìm kiếm các nhóm, tìm id của họ và mang lại kết quảget_group_info
với group_id đó.from facebook_scraper import FacebookScraper, utils, get_group_info from facebook_scraper.constants import FB_MOBILE_BASE_URL class FacebookScraper(FacebookScraper): def get_groups_by_search(self, word: str, **kwargs): """Searches Facebook groups and yields ids for each result on the first page""" group_search_url = utils.urljoin(FB_MOBILE_BASE_URL, f"search/groups/?q={word}") r = self.get(group_search_url) for group_element in r.html.find('div[role="button"]'): button_id = group_element.attrs["id"] group_id = find_group_id(button_id, r.text) yield get_group_info(group_id) def find_group_id(button_id, raw_html): """Each group button has an id, which appears later in the script tag followed by the group id.""" s = raw_html[raw_html.rfind(button_id) :] group_id = s[s.find("result_id:") :].split(",")[0].split(":")[1] return int(group_id) scraper = FacebookScraper() scraper.login(email=EMAIL, password=PWD) for group_info in scraper.get_groups_by_search("coffee"): print(group_info)
Kết quả:
{'id': '1996185023800606', 'name': 'Coffee lovers', 'type': 'Public group', 'members': 14299} {'id': '2204925119', 'name': 'COFFEE COFFEE COFFEE!!!', 'type': 'Public group', 'members': 340455} {'id': '755007758392142', 'name': 'LATTE ART', 'type': 'Public group', 'members': 46079} {'id': '534483107108037', 'name': 'BARISTA COMMUNITY', 'type': 'Public group', 'members': 169960} {'id': '721633338172381', 'name': 'Funny Coffee Memes', 'type': 'Public group', 'members': 219281} {'id': '587751572609633', 'name': 'Coffee ☕ & Rain 🌧', 'type': 'Public group', 'members': 116986} {'id': '823558245059998', 'name': '林芊妤 Coffee 粉絲群組', 'type': 'Public group', 'members': 7932} {'id': '1574636316089193', 'name': 'I Love Coffee', 'type': 'Public group', 'members': 208646} {'id': '120661273275592', 'name': 'Coffee & Cake Lovers 💏 ☕🍰', 'type': 'Public group', 'members': 40836} {'id': '359032028835121', 'name': 'Coffee ☕❤', 'type': 'Public group', 'members': 21074} {'id': '364701647546998', 'name': 'COFFEE BEANS MARKET', 'type': 'Public group', 'members': 56691} {'id': '746157059433578', 'name': 'Coffee Everyday', 'type': 'Public group', 'members': 6113}
hey pro, pls help me, i can't run this code
this is error
D:\MyJob\Python\PyCharm\ai-report.venv\lib\site-packages\facebook_scraper\facebook_scraper.py:855: UserWarning: Facebook language detected as vi_VN - for best results, set to en_US
warnings.warn(
Traceback (most recent call last):
File "D:\MyJob\Python\PyCharm\ai-report\demo.py", line 32, in