archivebox-browser-extension
archivebox-browser-extension copied to clipboard
Send page DOM + screenshot directly to archivebox when saving
When submitting a page to ArchiveBox, the extension should send the page DOM + MHTML + innerText + screenshot as well.
That way there is always a capture of the page exacly as it appears when browsing.
https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/tabs/captureVisibleTab
I'm not one of the devs but it look possible to either extend the API here a bit https://github.com/ArchiveBox/ArchiveBox/blob/dev/archivebox/core/views.py#L433
or uncomment and extend the @router.post("/snapshot"...) here https://github.com/ArchiveBox/ArchiveBox/blob/aa55e0d02e644e011e8a09b41c6c6c316c164d3c/archivebox/api/v1_core.py#L317
I just wondered if this feature could go one step further. I often find the singlepage to be unusable because modern website are simply broken with all this advertising and javascript.
A quick snoop through the DOM inspector to delete those works wonders every time but those changes aren't permanent.
And update function could figure the snapshot id from the url and get the content of the modified page is as looping over the document.children and getting their outerHTML.
Having tried the singlefile addon, archivebox uses the cli version, it does use the modified document, so if archivebox would allow sending the singlefile along, it might make sense to instead add an archivebox feature to the singlefile addon instead (or additionally).
I built a proof of concept of this that sends the current DOM + screenshot directly to an S3 compatible store (no server needed), it works great! https://github.com/ArchiveBox/screenshot-to-s3-extension
Any update on this?