facebook-scraper
facebook-scraper copied to clipboard
Translated text in posts with `get_posts` method
Hello! I'm getting posts using get_posts
method and when I get non-English text of original post it's include translated text right after original in post_text
.
Is there any ways to get only original text without translation? Maybe translated version is wrapped in some extra HTML tags?
Can you share a link to an example post that causes this problem?
It appears with every non-English post actually (Russian, Italian, Polish etc.):
https://facebook.com/262108804192078/posts/1461142234288723 https://facebook.com/262108804192078/posts/1461123174290629
Just few examples.
Screenshots
Original post written in Russian:

post_text
result of it:

You can disable translations in your Facebook account settings (https://www.facebook.com/settings?tab=language) - does that work for you?
Unfortunately, because I'm using many pre-made accounts with cookies I'm unable to change settings for every one. So that's why I'm trying to extract original text with scraper only.
I see. This commit (https://github.com/kevinzg/facebook-scraper/commit/a38acef545019c3cf8a8f31c443c50ce81f41158) adds a new key to the post result dict, called original_text
. Here's what https://facebook.com/262108804192078/posts/1461142234288723 looks like when extracted using the code as at that commit:
{'available': True,
'comments': 0,
'comments_full': None,
'factcheck': None,
'image': 'https://dtf.ru/cover/fb/c/1076271/1644663329/cover.jpg',
'image_id': None,
'image_ids': [],
'image_lowquality': 'https://external.fakl8-1.fna.fbcdn.net/safe_image.php?d=AQEQ-tUWi-2nRq0u&w=476&h=249&url=https%3A%2F%2Fdtf.ru%2Fcover%2Ffb%2Fc%2F1076271%2F1644663329%2Fcover.jpg&cfs=1&jq=75&ext=jpg&_nc_oe=6f924&_nc_sid=06c271&ccb=3-5>=1&_nc_hash=AQFZsR7vCOfH1mDf',
'images': ['https://dtf.ru/cover/fb/c/1076271/1644663329/cover.jpg'],
'images_description': [],
'images_lowquality': ['https://external.fakl8-1.fna.fbcdn.net/safe_image.php?d=AQEQ-tUWi-2nRq0u&w=476&h=249&url=https%3A%2F%2Fdtf.ru%2Fcover%2Ffb%2Fc%2F1076271%2F1644663329%2Fcover.jpg&cfs=1&jq=75&ext=jpg&_nc_oe=6f924&_nc_sid=06c271&ccb=3-5>=1&_nc_hash=AQFZsR7vCOfH1mDf'],
'images_lowquality_description': [None],
'is_live': False,
'likes': 0,
'link': 'https://dtf.ru/anime/1076271?fbclid=IwAR1H654cd88f3e5JFqlspnLJkE8x-\\-\\Vi3XfHLXvaKwOzj4hLxj1GxnRbHEo',
'links': [],
'original_request_url': 'https://facebook.com/262108804192078/posts/1461142234288723',
'original_text': 'Студия MAPPA официально анонсировала второй сезон аниме '
'«Магическая битва», чей полнометражный приквел стал самым '
'кассовым фильмом в японском прокате.\n'
'\n'
'Продолжение сериала о школьнике Юдзи выйдет в начале 2023 '
'года.',
'page_id': '262108804192078',
'post_id': 1461142234288723,
'post_text': 'Студия MAPPA официально анонсировала второй сезон аниме '
'«Магическая битва», чей полнометражный приквел стал самым '
'кассовым фильмом в японском прокате.\n'
'\n'
'Продолжение сериала о школьнике Юдзи выйдет в начале 2023 '
'года.\n'
'\n'
'MAPPA has officially announced the second season of the anime '
'"Magic Battle", whose full-fledged prequel became the most '
'cashmere film in Japan.\n'
'\n'
'The continuation of the series about a schoolgirl Yuji will be '
'released in early 2023.',
'post_url': 'https://facebook.com/story.php?story_fbid=1461142234288723&id=262108804192078',
'reaction_count': None,
'reactions': None,
'reactors': None,
'shared_post_id': None,
'shared_post_url': None,
'shared_text': 'DTF.RU\n'
'Студия MAPPA анонсировала второй сезон «Магической битвы» — '
'он выйдет в 2023 году — Аниме на DTF',
'shared_time': None,
'shared_user_id': None,
'shared_username': None,
'sharers': None,
'shares': 0,
'text': 'Студия MAPPA официально анонсировала второй сезон аниме «Магическая '
'битва», чей полнометражный приквел стал самым кассовым фильмом в '
'японском прокате.\n'
'\n'
'Продолжение сериала о школьнике Юдзи выйдет в начале 2023 года.\n'
'\n'
'MAPPA has officially announced the second season of the anime "Magic '
'Battle", whose full-fledged prequel became the most cashmere film in '
'Japan.\n'
'\n'
'The continuation of the series about a schoolgirl Yuji will be '
'released in early 2023.\n'
'\n'
'DTF.RU\n'
'Студия MAPPA анонсировала второй сезон «Магической битвы» — он '
'выйдет в 2023 году — Аниме на DTF',
'time': datetime.datetime(2022, 2, 13, 0, 2, 22),
'timestamp': 1644663742,
'user_id': '262108804192078',
'user_url': 'https://facebook.com/playdtf/?__tn__=C-R',
'username': 'DTF',
'video': None,
'video_duration_seconds': None,
'video_height': None,
'video_id': None,
'video_quality': None,
'video_size_MB': None,
'video_thumbnail': None,
'video_watches': None,
'video_width': None,
'w3_fb_url': None,
'was_live': False,
'with': None}
Please give it a try and check it works ok for you.
This is working perfectly. Thank you so much for help! 👍
@neon-ninja hello again! I've noticed that long text posts have ellipsis and "... More" label included in the original text. You can check it out with post: https://facebook.com/262108804192078/posts/1463319394071007

I see - try https://github.com/kevinzg/facebook-scraper/commit/527eabafe980347a51da3e9e557cf6a752d75a2a
use google_trans_new after scraping data