falcon icon indicating copy to clipboard operation
falcon copied to clipboard

[Q] In what format and where are the indices stored?

Open NightMachinery opened this issue 4 years ago • 3 comments

Adding this info to the readme can be helpful.

NightMachinery avatar May 13 '21 18:05 NightMachinery

Possibly related to #62

danuker avatar Mar 16 '22 10:03 danuker

I had a look into the extension's source: It's stored inside web extensions storage.local area, as [time]: { text } objects. It also keeps both a time index (timestamp of all existing entries) and a two week "preloaded cache" for quick access, which means all visited sites from the past 14 days are permanently in your browser memory. This could theoretically lead to memory problems, as the websites' texts are never truncated. If you search for text older than that, all entries for the specified time frame (determined using the time index) are retrieved from storage (again, possibly large amount of memory) and then processed.

This is all pretty clever and a reasonable implementation imo. I'm not sure what better way there could be using web extensions (that support FTS) - the only thing I could think of is a WASM SQlite module.

phil294 avatar Mar 16 '22 14:03 phil294

Thanks for figuring it out! I did not know extensions had a separate "inspect" area when debugging them. More here for newbies, though it seems like it still has no easy export/import functionality in the browser.

I'm not sure what better way there could be

There are clever data structures like inverted indices, which grow with vocabulary (which has a limit) and not the amount of text on pages you visit. But the number of URLs still keeps growing.

danuker avatar Mar 16 '22 14:03 danuker