HPI icon indicating copy to clipboard operation
HPI copied to clipboard

module suggestion: firefox history/bookmarks/etc.

Open redthing1 opened this issue 3 years ago • 7 comments

I think a module for accessing Firefox data would be very useful. This documentation on the Mozilla website details how the data is stored.

I am thinking that a separate script could be used to walk through that database and generate a JSON dump (similarly to how rexport works), and then an HPI module could provide access to that data.

I'm new to this project so I am not yet really familiar with how modules work, but when I have time I will attempt it and submit a PR.

redthing1 avatar Dec 24 '20 04:12 redthing1

Oh, it looks like one of the contributors to this repo @seanbreckenridge has already done this!

I wonder if there is already a corresponding HPI module or does it still need to be written?

redthing1 avatar Dec 24 '20 05:12 redthing1

I personally dont use bookmarks in the browser (just have a textfile with a script to open stuff), so I haven't written anything to parse that yet. Feel free to open an issue on ffexport if thats something youre interested in

Otherwise yeah, ffexport lets you export history, have a script here that saves my history sqlite file every couple weeks.

The my.browsing file on my branch uses parts of ffexport to load the data in; it also copies the live history database when computing my so it includes any backups and the current history.

As a demo:

>>> from collections import Counter
>>> from urllib.parse import urlparse
>>> from my.browsing import history
>>> Counter(map(lambda v: urlparse(v.url).netloc, history())).most_common(5)
[('github.com', 39666), ('discord.com', 21064), ('www.youtube.com', 19497), ('duckduckgo.com', 19152), ('www.google.com', 9598)]

No need to export it to JSON (though ffexport can do that), it merges and removes duplicates this from copies of the sqlite files directly

I know karlicoss uses promnesia, so that may be why that hasnt been incorporated into HPI

seanbreckenridge avatar Dec 24 '20 05:12 seanbreckenridge

Just as an update, I've since converted that into browserexport, which supports reading history from:

  • Firefox (and Waterfox)
  • Chrome (and Chromium, Brave, Vivaldi)
  • Safari
  • Palemoon

If you wanted to use this, you could install my HPI modules alongside this repository (see here)

Run hpi module install my.browsing to install dependencies

setup a config block in your config file like:

# uses browserexport https://github.com/seanbreckenridge/browserexport
class browsing:
    # folder which contains your backed up databases
    export_path: Paths = "~/data/browsing"

    # additionally, read history from my active firefox database
    from browserexport.browsers.firefox import Firefox

    live_databases: Paths = Firefox.locate_database()

Then use the history function:

[ ~ ] $ ipython

In [1]: from my.browsing import history

In [2]: visits = list(history())

In [3]: len(visits)
Out[3]: 390621

[ ~ ] $ hpi query --limit 1 my.browsing.history
[{"url":"https://duckduckgo.com/?q=Brave+Verified+sites&t=brave","dt":"2020-07-21T00:11:23.544069+00:00","metadata":{"title":"Brave Verified sites at DuckDuckGo","description":null,"preview_image":null,"duration":null}}]

No support for bookmarks (yet), (I just use this); may add it in the future if someone is interested

seanbreckenridge avatar May 23 '21 02:05 seanbreckenridge

That's great, thanks! I'll experiment with hooking it up to cachew, and definitely would be up for using it in Promnesia!

karlicoss avatar May 23 '21 20:05 karlicoss

Sounds good - I think I already have it hooked up to cachew, unless you mean something different. Corresponding promnesia Source for now

Only thing missing before a PR is the FirefoxMobile Browser/logic, need to export a db from my (now rooted) phone, and look at the browser source file in promnesia.

seanbreckenridge avatar May 24 '21 03:05 seanbreckenridge

Ah -- by cachew support, I meant 'incremental' caching, so ideally if you add a new database, you'd ideally just 'merge' it in with the previously cached results.. kind of what the madness here was achieving, but without the madness :) https://github.com/karlicoss/promnesia/blob/ea9d9ef8e654c9daee7f7fb1ac458d586f8d4393/src/promnesia/sources/browser.py#L50-L51

karlicoss avatar May 24 '21 18:05 karlicoss

@redthing1 browser history has a module here now; see here to set it up

If bookmarks from the databases is something you're still interested in, feel free to create an issue here

seanbreckenridge avatar Feb 14 '22 00:02 seanbreckenridge