Legacy-Research-Engine icon indicating copy to clipboard operation
Legacy-Research-Engine copied to clipboard

Store multiple versions pages for the index & a WBM style copy if site goes down.

Open blackforestboi opened this issue 8 years ago • 4 comments

I got a message from Sahil via email, this is his question/proposal:

Can we save multiple copies of the index (i.e. what happens when a page goes down, or link has 404? Will users be redirected to Wayback Machine?

My answer: Not yet, up to that point we do not store the page completely so that you can re-visit it without a connection or take multiple versions into account. We only store the text so you can search it again. But this is a feature planned for the future. (also excellent idea to connect it to the WBM!! :)

Additional to answer: I think it's future stuff for now, but we definitely should consider it to store a readable version of a website and/or connect feed into WBM.


Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

blackforestboi avatar Jan 17 '17 20:01 blackforestboi

I think the size is the problem then size of the database file will be pretty big even if you remove the all the elements except text one.

We are talking about storing the data of 100's of webpages on my work I view 200 to 300 sites daily.

Droyk avatar Jan 17 '17 20:01 Droyk

Well, the text itself is not so big, Its basically what we store in the DB. But storing a complete HTML to make it retrievable as it was is definitely big.

A reader version is thinkable, as it would pull the text that is already in the DB.

blackforestboi avatar Jan 17 '17 20:01 blackforestboi

or just add an option in setting menu or in extension icon menu to store the page in your servers like Wayback machine does... the size won't be that much of a problem then or I think it will solve most of the problem the only disadvantage is privacy though ;(

Droyk avatar Jan 17 '17 20:01 Droyk

good idea!

Will keep it in mind. :)

blackforestboi avatar Jan 17 '17 20:01 blackforestboi