rss-bridge icon indicating copy to clipboard operation
rss-bridge copied to clipboard

Try to scrap Facebook using their mbasic site

Open somini opened this issue 5 years ago • 4 comments

Facebook has an mbasic site at https://mbasic.facebook.com/ which gives me cleaner data than the pseudo-mobile site scrapped by both existing bridges.

This is just a suggestion, since I can't see a nicer way to so this.

Pinging existing Facebook bridge maintainers @teromene @logmanoriginal, maybe this was already tried and abandoned?

somini avatar May 24 '20 00:05 somini

It's always safer to scrape the primary site, www.facebook.com, since we never know when alternative sites might be removed.

triatic avatar Jun 01 '20 21:06 triatic

Interesting discussion.

_

It's always safer to scrape the primary site, www.facebook.com, since we never know when alternative sites might be removed.

_

yes, this sounds like good advice. These sites variants appear and disapear like crazy. I never even knew touch.facebook.com or mbasic.facebook.com existed. Some times I visit reddit.com and also discover that you can use mobile.twitter.com or sometimes is m.dot.something or wap.something.com or old.something.com. Or lite.duckduckgo.com or html.duckduckgo.com

But I also seem to have found other interesting details. Like to share some.

I've started recently to test RSSBridge and other competittors. My main interest is getting RSS out of Facebook.

I'm trying to find out exactly and thouroughly what and how do the different RSSBridge Facebook scraper variants, and their options and parameters choices result in. I don't see answers to these questions of mine, in the docs. I'm not a developer or understand PHP also. So feel free to add your comment if you have real experience and understanding.

Confirmed:

  • seems to give me many more feed items, and much older.

By choosing the "Fb2" scraper, called "Facebook Bridge | Touch Site" which seems to scrape touch.facebook.com, instead of the "Main" scraper that scrapes www.facebook,.com One example I got 20 items posted in 2020, and with FB2 it gave me 100 items going back to 2018. Tried different profiles and always keep getting these "20" and "100" items thing reproduced. Would eventually scraping mbasic.facebook.com get me even more and older items ? Good question. Maybe there's even a "secret" facebook variant that you can scrape years of old posts :-) ?

Still Need more Testing:

  • does it make a difference in the html I'm getting ? Will it be "cleaner" ? Less complicated URLS ?
  • does it make any difference in timeouts, speed of scraping ?
  • will this choice (mobile/standard scraper) some how influence, the eventual blocking of my scraper by Facebook ?
  • will one of these non canonical variants (touch.facebook.com or mbasic.facebook.com) influence how facebook redirects you to an international/national site variant ? Will it mess up the language ?

Otherwise I'm pretty satisfied with the job RSSBridge makes with Facebook. Or better said, amazed, that this can still be done in 2020.

A big Thank You to all the guys who maintain these Facebook scrapers. Great job.

Tried also other python options like, https://github.com/irfancharania/fb-feed-gen But get less results

m040601 avatar Aug 12 '20 14:08 m040601

In the last months I've been getting worse and worse results with the "regular" FB bridge - it looks like the HTML served is increasingly dirty (actual content difficult to separate from surrounding stuff, problems like #1774, ...). FB2 works intermittently - I generally get good results, but often it fails with "Unable to get the page id.", even providing the page ID directly, so I think it may be related to rate limiting. Maybe it's time to investigate mbasic more thoroughly?

About its future availability, I somewhat hope that it's been tied to some "weak" embedded device, so it may be here to stay at least for some while. OTOH the main site is subject to continuous redesign, so it's kind of a moving target anyway?

cvtsi2sd avatar Jan 18 '21 09:01 cvtsi2sd

how is the states of facebook bridges as of now ?

JLuc avatar Dec 23 '24 20:12 JLuc