nyxt icon indicating copy to clipboard operation
nyxt copied to clipboard

Remembrance (local search)

Open Ambrevar opened this issue 1 year ago • 42 comments

Description

New mode which caches the textual content of visited pages, then allow for searching them, displaying them, etc.

We currently store the following:

  • URL
  • Title
  • Textual content
  • Keywords.

Discussion

  • [x] Do you like the name "remembrance"?

  • [x] Any better suggestion to name the "cached pages?"

  • [ ] Better name for look-up-cache? Maybe be more consistent with search-buffer? On the other hand, we don't want to confuse the user.

How to test

  • Enable remembrance-mode.
  • Browse a few URLs.
  • Call recollect-visited-page and type in a search query. Press Alt-Return and run the view-cached-content action.

To do

  • [x] Add option to automatically cache all visited pages.
  • [x] Garbage collect cache after some time.
  • [x] Include keywords in cache? They come for free after all.
  • [x] Allow manual adding of buffers, bookmarks, history entries.
  • [x] Support for diff-ing the current page with the cached page. What's the state of diff-mode? @aadcg ?

Checklist:

Everything in this checklist is required for each PR. Please do not approve a PR that does not have all of these items.

  • [x] I have pulled from master before submitting this PR
  • [x] There are no merge conflicts.
  • [x] I've added the new dependencies as:
    • [x] ASDF dependencies,
    • [x] Git submodules,
      cd /path/to/nyxt/checkout
      git submodule add https://gitlab.common-lisp.net/nyxt/py-configparser _build/py-configparser
      
    • [x] and Guix dependencies.
  • [x] My code follows the style guidelines for Common Lisp code. See:
  • [x] I have performed a self-review of my own code.
  • [x] My code has been reviewed by at least one peer. (The peer review to approve a PR counts. The reviewer must download and test the code.)
  • [x] Documentation:
    • [x] All my code has docstrings and :documentations written in the aforementioned style. (It's OK to skip the docstring for really trivial parts.)
    • [x] I have updated the existing documentation to match my changes.
    • [x] I have commented my code in hard-to-understand areas.
    • [x] I have updated the changelog.lisp with my changes if it's anything user-facing (new features, important bug fix, compatibility breakage).
    • [x] I have added a migration.lisp entry for all compatibility-breaking changes.
  • [x] Compilation and tests:
    • [x] My changes generate no new warnings.
    • [x] I have added tests that prove my fix is effective or that my feature works. (If possible.)
    • [ ] New and existing unit tests pass locally with my changes.

Ambrevar avatar Sep 02 '22 14:09 Ambrevar

Question: Am I using internal pages correctly here? Note that I'm enabling remembrance-mode when creating the internal page, which does not seem right. On the other hand, we need to access the cache somehow. Any suggestion? @aadcg @aartaka ?

Also about style: is it possible to not specify the mode style and have (style mode) do the right thing by fetching the default style?

Ambrevar avatar Sep 02 '22 14:09 Ambrevar

Another critical question: should this really be a separate mode, or rather a history option? After all, this is tightly linked to the pages we visit.

Ambrevar avatar Sep 05 '22 08:09 Ambrevar

Another critical question: should this really be a separate mode, or rather a history option? After all, this is tightly linked to the pages we visit.

I'd say keep it a mode, because:

  • It's an opt-in feature.
  • It stores some non-trivial state.
  • The cache is an opaque structure, not sure we want to store it in history or around it.
  • It's also about certain pages/interfaces, not only about history movement.

aartaka avatar Sep 05 '22 11:09 aartaka

  • It's also about certain pages/interfaces, not only about history movement.

What do you mean? Example?

Ambrevar avatar Sep 05 '22 12:09 Ambrevar

I'm not being able to test the functionality. When I call look-up-cache I get:

There is no applicable method for the generic function
  #<STANDARD-GENERIC-FUNCTION NYXT:URL (19)>
when called with arguments
  (#<MONTEZUMA:DOCUMENT last-update content title url {101666F2A3}>)

aadcg avatar Sep 05 '22 13:09 aadcg

Do you like the name "remembrance"?

I do!

Any better suggestion to name the "cached pages?"

From the technical point of view, or from the user point of view? For the former I think we're ok. For the latter, perhaps "rememberable pages".

Better name for look-up-cache? Maybe be more consistent with search-buffer? On the other hand, we don't want to confuse the user.

I believe you want to say "lookup", not "look up".

aadcg avatar Sep 05 '22 14:09 aadcg

Another critical question: should this really be a separate mode, or rather a history option? After all, this is tightly linked to the pages we visit.

A mode, as @aartaka pointed out. Perhaps there could be a user option that would automatically remember buffers based on a list of functions that would take a buffer and output a boolean.

aadcg avatar Sep 05 '22 14:09 aadcg

Another critical question: should this really be a separate mode, or rather a history option? After all, this is tightly linked to the pages we visit.

A mode, as @aartaka pointed out. Perhaps there could be a user option that would automatically remember buffers based on a list of functions that would take a buffer and output a boolean.

This idea of a buffer function feels like auto-mode/auto-rules...

aartaka avatar Sep 05 '22 14:09 aartaka

@aadcg Just pushed some fixes, should be working now.

Ambrevar avatar Sep 05 '22 15:09 Ambrevar

From the technical point of view, or from the user point of view? For the former I think we're ok. For the latter, perhaps "rememberable pages".

Actually "remembered pages", because we are talking about the data that's already cached to disk.

Better name for look-up-cache? Maybe be more consistent with search-buffer? On the other hand, we don't want to confuse the user.

I believe you want to say "lookup", not "look up".

Nope, it's a verb, hence "look up". "lookup" is a noun.

Ambrevar avatar Sep 05 '22 15:09 Ambrevar

@aartaka I've included an important fix for Nfile profiles (crazy I forgot to implement profile switching!). Before this, tests were not working properly because the "test" profile was not really used.

I've also added 2 helpers, with-headless and wait-on-handler. The last one is super useful to save countless handler boilerplate snippets. It also paves the way for more linear headless test writing.

I propose we add it to nhooks, what do you think?

Ambrevar avatar Sep 05 '22 15:09 Ambrevar

A mode, as @aartaka pointed out. Perhaps there could be a user option that would automatically remember buffers based on a list of functions that would take a buffer and output a boolean.

This idea of a buffer function feels like auto-mode/auto-rules...

Indeed, we want to be able to toggle the mode based on rules, like domain matching rules.

Ambrevar avatar Sep 05 '22 15:09 Ambrevar

@aartaka I've included an important fix for Nfile profiles (crazy I forgot to implement profile switching!). Before this, tests were not working properly because the "test" profile was not really used.

We've been using it wrong all this time 0_o

I've also added 2 helpers, with-headless and wait-on-handler. The last one is super useful to save countless handler boilerplate snippets. It also paves the way for more linear headless test writing.

I propose we add it to nhooks, what do you think?

Yes, why not :) Seems like we have to put all the fancy recent hook macros into a separate file already...

aartaka avatar Sep 05 '22 19:09 aartaka

Yes, why not :) Seems like we have to put all the fancy recent hook macros into a separate file already...

Where?

Ambrevar avatar Sep 06 '22 06:09 Ambrevar

I'm still not able to test it. I started Nyxt without a config file.

2022-09-06_09:46:14

aadcg avatar Sep 06 '22 06:09 aadcg

Yes, why not :) Seems like we have to put all the fancy recent hook macros into a separate file already...

Where?

I mean, into a dedicated file in nhooks repo :)

aartaka avatar Sep 06 '22 07:09 aartaka

@aadcg Should work now.

Ambrevar avatar Sep 06 '22 10:09 Ambrevar

@Ambrevar I'm not getting the following error when enabling remembrance-mode:

When attempting to read the slot's value (slot-value), the slot
NYXT::FN is missing from the object
#<NYXT:COMMAND NYXT/REMEMBRANCE-MODE:REMEMBRANCE-MODE (1)>.

aadcg avatar Sep 07 '22 08:09 aadcg

Am I right that in order to search over cached files you need to firstly open them and them search?

aadcg avatar Sep 07 '22 08:09 aadcg

I can't reproduce @aadcg. Can you share a precise recipe?

Am I right that in order to search over cached files you need to firstly open them and them search?

Just follow the original post instructions: enable the mode, load some URLs, then call recollect-visited-page.

Ambrevar avatar Sep 07 '22 10:09 Ambrevar

New feature: the query terms are now highlighted in the content view!

Ambrevar avatar Sep 07 '22 11:09 Ambrevar

It's now mostly feature complete (minus the diff support), ready for review.

I'll complete the tests tomorrow.

Ambrevar avatar Sep 07 '22 11:09 Ambrevar

Wait, it only works on HTTP(S) pages? I wanted to use the fulltext search on my own custom schemes...

Just kidding, that's most probably out of scope.

aartaka avatar Sep 08 '22 07:09 aartaka

Regarding displaying cached pages, they look a bit off, since the lines are too long and it seems that I can't even scroll them horizontally.

2022-09-08_11:20:20

It seems that searching them yields no results... 2022-09-08_11:22:07

Another thing that looks off is the prompt message in the following prompt: 2022-09-08_11:27:53

aadcg avatar Sep 08 '22 08:09 aadcg

Wait, it only works on HTTP(S) pages? I wanted to use the fulltext search on my own custom schemes...

No, it works with everything where Parenscript works. Except internal pages, well, because there is (for now) little point.

Ambrevar avatar Sep 08 '22 13:09 Ambrevar

Regarding displaying cached pages, they look a bit off, since the lines are too long and it seems that I can't even scroll them horizontally.

Can't reproduce. There should be an horizontal scroll bar at the bottom of the "pre" frame. If you have a touchpad, horizontal scrolling also works fine.

That said, maybe we can use a regular <p> tag instead? Thoughts?

Alternative: store both the innerText and the innerHTML:

  • use the innerText for searching;
  • use the innerHTML for rendering.

Drawback:

  • It's (much?) heavier in the database;
  • In some cases it may display worse content because HTML-without-JS is not always meant to be :p

Ambrevar avatar Sep 08 '22 14:09 Ambrevar

It seems that searching them yields no results...

You found a bug in search-buffer-mode! It does not work on pre tags... @aartaka Any idea why?

Ambrevar avatar Sep 08 '22 14:09 Ambrevar

The HTML content is now displayed in the view!

Ambrevar avatar Sep 12 '22 08:09 Ambrevar

While testing this I got the following error:

EOF
   [Condition of type SIMPLE-ERROR]

Restarts:
 0: [ABORT] abort thread (#<THREAD "Nyxt renderer signal handler" RUNNING {100A298CB3}>)

Backtrace:
 0: ((:METHOD MONTEZUMA::REFILL (MONTEZUMA::BUFFERED-INDEX-INPUT)) #<MONTEZUMA::CS-INDEX-INPUT {1008BD7463}>) [fast-method]
      Locals:
        MONTEZUMA::SELF = #<MONTEZUMA::CS-INDEX-INPUT {1008BD7463}>
 1: ((:METHOD MONTEZUMA::READ-BYTE (MONTEZUMA::BUFFERED-INDEX-INPUT)) #<MONTEZUMA::CS-INDEX-INPUT {1008BD7463}>) [fast-method]
      Locals:
        MONTEZUMA::SELF = #<MONTEZUMA::CS-INDEX-INPUT {1008BD7463}>
 2: ((:METHOD MONTEZUMA::READ-VINT (MONTEZUMA::INDEX-INPUT)) #<MONTEZUMA::CS-INDEX-INPUT {1008BD7463}>) [fast-method]
      Locals:
        MONTEZUMA::SELF = #<MONTEZUMA::CS-INDEX-INPUT {1008BD7463}>
 3: ((:METHOD MONTEZUMA::READ-TERM-BUFFER (MONTEZUMA::TERM-BUFFER T T)) #<MONTEZUMA::TERM-BUFFER field:"content" text:"한국어" {1008BD7473}> #<MONTEZUMA::CS-INDEX-INPUT {1008BD7463}> #<MONTEZUMA::FIELD-INFOS..
      Locals:
        MONTEZUMA::FIELD-INFOS = #<MONTEZUMA::FIELD-INFOS {10085C84C3}>
        MONTEZUMA::INPUT = #<MONTEZUMA::CS-INDEX-INPUT {1008BD7463}>
        MONTEZUMA::SELF = #<MONTEZUMA::TERM-BUFFER field:"content" text:"한국어" {1008BD7473}>
 4: ((:METHOD MONTEZUMA::NEXT? (MONTEZUMA::SEGMENT-TERM-ENUM)) #<MONTEZUMA::SEGMENT-TERM-ENUM {100883F243}>) [fast-method]
      Locals:
        MONTEZUMA::SELF = #<MONTEZUMA::SEGMENT-TERM-ENUM {100883F243}>
 5: ((:METHOD MONTEZUMA::SCAN-TO (MONTEZUMA::SEGMENT-TERM-ENUM T)) #<MONTEZUMA::SEGMENT-TERM-ENUM {100883F243}> #S(MONTEZUMA::TERM :FIELD "url" :TEXT "https://duckduckgo.com/l/?uddg=https%3A%2F%2Fen.wikip..
      Locals:
        MONTEZUMA::SELF = #<MONTEZUMA::SEGMENT-TERM-ENUM {100883F243}>
        MONTEZUMA::TERM = #S(MONTEZUMA::TERM ..)
 6: ((:METHOD MONTEZUMA::SCAN-FOR-TERM-INFO (MONTEZUMA::TERM-INFOS-READER T)) #<MONTEZUMA::TERM-INFOS-READER {10085C84E3}> #S(MONTEZUMA::TERM :FIELD "url" :TEXT "https://duckduckgo.com/l/?uddg=https%3A%2F..
      Locals:
        MONTEZUMA::E = #<MONTEZUMA::SEGMENT-TERM-ENUM {100883F243}>
        MONTEZUMA::SELF = #<MONTEZUMA::TERM-INFOS-READER {10085C84E3}>
        MONTEZUMA::TERM = #S(MONTEZUMA::TERM ..)
 7: ((:METHOD MONTEZUMA::TERM-DOC-FREQ (MONTEZUMA::SEGMENT-READER T)) #<MONTEZUMA::SEGMENT-READER "_1" (1 docs, 0 deleted docs, 6 field infos) {1008543BA3}> #S(MONTEZUMA::TERM :FIELD "url" :TEXT "https://..
      Locals:
        MONTEZUMA::SELF = #<MONTEZUMA::SEGMENT-READER "_1" (1 docs, 0 deleted docs, 6 field infos) {1008543BA3}>
        MONTEZUMA::TERM = #S(MONTEZUMA::TERM ..)
 8: ((:METHOD INITIALIZE-INSTANCE :AFTER (MONTEZUMA::TERM-WEIGHT)) #<MONTEZUMA::TERM-WEIGHT query: #<MONTEZUMA:TERM-QUERY "url":"https://duckduckgo.com/l/?uddg=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FSpec..
      Locals:
        MONTEZUMA::QUERY = #<MONTEZUMA:TERM-QUERY "url":"https://duckduckgo.com/l/?uddg=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FSpecial%3ASearch%3Fsearch%3Dukraine%26go%3DGo&rut=25ff3b97aa81448fb84fa293c2d8d7a712f07809f371aaaa021bd7a912e1a077"^1.0 {100A2F7593}>
        MONTEZUMA::SEARCHER = #<MONTEZUMA:INDEX-SEARCHER {1008543BC3}>
        MONTEZUMA::SELF = #<MONTEZUMA::TERM-WEIGHT query: #<MONTEZUMA:TERM-QUERY "url":"https://duckduckgo.com/l/?uddg=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FSpecial%3ASearch%3Fsearch%3Dukraine%26go%3DGo&rut=25ff3b97aa81448fb84fa293c2d8d7a712f07809f371aaaa021bd7a912e1a077"^1.0 {100A2F7593}> {100A2FD0E3}>
 9: ((LAMBDA (SB-PCL::|.P0.| SB-PCL::|.P1.|)) #<unavailable argument> #<unavailable argument>)
      [No Locals]
10: ((:METHOD MONTEZUMA:WEIGHT (MONTEZUMA::QUERY T)) #<MONTEZUMA:BOOLEAN-QUERY with 1 clauses: #<MONTEZUMA:BOOLEAN-CLAUSE :SHOULD-OCCUR #<MONTEZUMA:TERM-QUERY "url":"https://duckduckgo.com/l/?uddg=https%3..
      Locals:
        MONTEZUMA::SEARCHER = #<MONTEZUMA:INDEX-SEARCHER {1008543BC3}>
        MONTEZUMA::SELF = #<MONTEZUMA:BOOLEAN-QUERY with 1 clauses: #<MONTEZUMA:BOOLEAN-CLAUSE ..>>
11: ((:METHOD MONTEZUMA:SEARCH (MONTEZUMA:INDEX-SEARCHER T)) #<MONTEZUMA:INDEX-SEARCHER {1008543BC3}> #<MONTEZUMA:BOOLEAN-QUERY with 1 clauses: #<MONTEZUMA:BOOLEAN-CLAUSE :SHOULD-OCCUR #<MONTEZUMA:TERM-QU..
      Locals:
        MONTEZUMA::FILTER = NIL
        MONTEZUMA::MAX-SIZE = 10000
        MONTEZUMA::NUM-DOCS = 10000
        MONTEZUMA::OPTIONS = (:NUM-DOCS 10000)
        MONTEZUMA::QUERY = #<MONTEZUMA:BOOLEAN-QUERY with 1 clauses: #<MONTEZUMA:BOOLEAN-CLAUSE ..>>
        MONTEZUMA::SELF = #<MONTEZUMA:INDEX-SEARCHER {1008543BC3}>
        MONTEZUMA::SORT = NIL
12: ((:METHOD NYXT/REMEMBRANCE-MODE::SEARCH-CACHE (NYXT/REMEMBRANCE-MODE:REMEMBRANCE-MODE T)) #<NYXT/REMEMBRANCE-MODE:REMEMBRANCE-MODE {10082250D3}> #<MONTEZUMA:BOOLEAN-QUERY with 1 clauses: #<MONTEZUMA:B..
      Locals:
        NYXT:MODE = #<NYXT/REMEMBRANCE-MODE:REMEMBRANCE-MODE {10082250D3}>
        NYXT:QUERY = #<MONTEZUMA:BOOLEAN-QUERY with 1 clauses: #<MONTEZUMA:BOOLEAN-CLAUSE ..>>
13: (NYXT/REMEMBRANCE-MODE::FIND-URL #<QURI.URI.HTTP:URI-HTTPS https://duckduckgo.com/l/?uddg=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FSpecial%3ASearch%3Fsearch%3Dukraine%26go%3DGo&rut=25ff3b97aa81448fb84f..
      Locals:
        REMEMBRANCE-MODE = #<NYXT/REMEMBRANCE-MODE:REMEMBRANCE-MODE {10082250D3}>
        URL = #<QURI.URI.HTTP:URI-HTTPS https://duckduckgo.com/l/?uddg=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FSpecial%3ASearch%3Fsearch%3Dukraine%26go%3DGo&rut=25ff3b97aa81448fb84fa293c2d8d7a712f07809f371aaaa021bd7a912e1a077>
14: (NYXT/REMEMBRANCE-MODE::BUFFER->CACHE #<NYXT:WEB-BUFFER 816 {1007CC8783}> #<NYXT/REMEMBRANCE-MODE:REMEMBRANCE-MODE {10082250D3}>)
      Locals:
        BUFFER = #<NYXT:WEB-BUFFER 816 {1007CC8783}>
        REMEMBRANCE-MODE = #<NYXT/REMEMBRANCE-MODE:REMEMBRANCE-MODE {10082250D3}>
        URL = #<QURI.URI.HTTP:URI-HTTPS https://duckduckgo.com/l/?uddg=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FSpecial%3ASearch%3Fsearch%3Dukraine%26go%3DGo&rut=25ff3b97aa81448fb84fa293c2d8d7a712f07809f371aaaa021bd7a912e1a077>
15: ((:METHOD NYXT:ON-SIGNAL-LOAD-FINISHED (NYXT/REMEMBRANCE-MODE:REMEMBRANCE-MODE T)) #<NYXT/REMEMBRANCE-MODE:REMEMBRANCE-MODE {10082250D3}> #<QURI.URI.HTTP:URI-HTTPS https://duckduckgo.com/l/?uddg=https..
      Locals:
        NYXT:MODE = #<NYXT/REMEMBRANCE-MODE:REMEMBRANCE-MODE {10082250D3}>
        NYXT:URL = #<QURI.URI.HTTP:URI-HTTPS https://duckduckgo.com/l/?uddg=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FSpecial%3ASearch%3Fsearch%3Dukraine%26go%3DGo&rut=25ff3b97aa81448fb84fa293c2d8d7a712f07809f371aaaa021bd7a912e1a077>
16: ((:METHOD NYXT:ON-SIGNAL-LOAD-FINISHED (NYXT:BUFFER T)) #<NYXT:WEB-BUFFER 816 {1007CC8783}> #<QURI.URI.HTTP:URI-HTTPS https://duckduckgo.com/l/?uddg=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FSpecial%3AS..
      Locals:
        NYXT:BUFFER = #<NYXT:WEB-BUFFER 816 {1007CC8783}>
        NYXT:URL = #<QURI.URI.HTTP:URI-HTTPS https://duckduckgo.com/l/?uddg=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FSpecial%3ASearch%3Fsearch%3Dukraine%26go%3DGo&rut=25ff3b97aa81448fb84fa293c2d8d7a712f07809f371aaaa021bd7a912e1a077>

aadcg avatar Sep 12 '22 13:09 aadcg

It seems that searching them yields no results...

You found a bug in search-buffer-mode! It does not work on pre tags... @aartaka Any idea why?

Whaaaaaat? No way...

aartaka avatar Sep 12 '22 16:09 aartaka