monolith icon indicating copy to clipboard operation
monolith copied to clipboard

Saving Facebook webpages results in a broken output

Open avarixa opened this issue 1 year ago • 7 comments

The monolith output of a Facebook webpage that would require a login is a broken, mostly un-loaded version of the page with the login popup.

Using a Chromium/Chrome instance to pipe it into monolith results in the same, whether I'm using --incognito or not.

https://imgur.com/0NZwObS

avarixa avatar Jul 15 '24 14:07 avarixa

Using Chromium headless, what if you give it more time before printing the output into the STDOUT? I think it's --virtual-time-budget=10000.

snshn avatar Jul 15 '24 19:07 snshn

Thanks for the response!

Just tried it, same result. Even tried it on my main Chrome install instead of chromium to no avail

avarixa avatar Jul 16 '24 02:07 avarixa

That's odd. And what if you "save page as" via the browser, does it open the result from file:///?

you can use monolith on local files by the way, just point it at the file instead of https:// and it'll make one .html bundle out of it.

snshn avatar Jul 16 '24 02:07 snshn

Attempt #1: Using Save Page As > Webpage Complete, the resulting file saved on disk had no CSS/JS (for some reason, this has always been a Facebook problem). When I used monolith on it, I had an Out of Memory error on chrome upon opening the output.

Attempt #2: Using Save Page As > Single File (.mhtml), the resulting file saved on disk had formatting but missing some media. This is as close as I got to getting what I wanted, but wanted to try and capture as much media as I can with monolith. When I used monolith on the .mhtml file, it resulted in a weird HTML with only text containing the original link, my date and time, and some hash.

avarixa avatar Jul 16 '24 02:07 avarixa

Uh-oh. Thank you for trying it. I'll look into he out-of-memory issue along with checking to see how I can improve monolith for Facebook pages. I know that website isn't exactly made for archiving, even saving images from FB is a big deal, just like with instagram. So it may be partially intentional, to prevent people from saving pages, make them visit the actual site.

snshn avatar Jul 16 '24 06:07 snshn

Thanks - will mess around with the flags and other things I can to see if there's a workaround

avarixa avatar Jul 16 '24 08:07 avarixa

There's always SingleFile browser extension, that will probably work quite well.

snshn avatar Jul 16 '24 10:07 snshn