emanote icon indicating copy to clipboard operation
emanote copied to clipboard

Full-text-search via Stork

Open applejag opened this issue 2 years ago • 12 comments

Closes #324

Changes

  • Added Stork JS & CSS snippets <snippet var="js.stork-search" />
  • Added Stork JS & CSS to docs site
  • Added Stork search component, and have it hidden by default until the Stork CSS is loaded.
  • Added custom CSS to make Stork search somewhat fit into the Emanote aesthetic, including responsive design so you can search while using a phone as well, including the Neurons-like note template.

What's missing

  • [ ] Currently you have to do index building via stork CLI yourself. Preferably enabled with a setting such as:

    # index.yaml, mockup/example of how settings could look
    
    template:
      # generates "/-/stork.st", requires "stork" CLI installed
      #   "never"      (default) Emanote will never try to generate a search index
      #   "always"     Build index on "emanote gen" & "emanote run"
      #   "on-static"  Build index on "emanote gen"
      #   "on-live"    Build index on "emanote run"
      buildStorkSearchIndex: always
    
    page:
      # disable index inclusion on a per-page/per-route basis
      excludeSearchIndex: true
    
  • [x] ~~Easy way to enable this via index.yaml~~ Fixed, now there's only snippets to enable.

  • [x] ~~Fix for Neuron layout (e.g https://jillejr.github.io/emanote/demo/neuron-layout)~~ Fixed! :)

Preview

Demo: https://jillejr.github.io/emanote/

image

It's also sticky: (difficult to show in a screenshot, but try it out on e.g https://jillejr.github.io/emanote/resources/vim)

image

Also works on mobile layout:

image

I used that published GitHub Pages to do some manual testing on my phone.

In a separate branch I made small changes to build the index in the CI pipeline and publish it to that above GitHub Pages site.

  • Changes: https://github.com/jilleJr/emanote/compare/feature/stork-search...jilleJr:emanote:feature/stork-search-poc
  • Sample build: https://github.com/jilleJr/emanote/runs/7592894621?check_suite_focus=true

applejag avatar Jul 30 '22 17:07 applejag

<rant>

One thing to consider here is that this has a lot of impact on how Emanote sites are used, and I predict they will also have a great impact on how they are written as well.

On the negative side, this search feature could:

  • render tags useless.
  • remove importance on creating proper folgezettel links to connect your knowledge.

This isn't just an added positive feature, it also dulls down the other features of Emanote.

On the other hand, the positives are things like:

  • user can search for related topics if they've forgotten the main subject, and then navigate via folgezettel links to the one they sought.
  • speeds up usage when you know ahead-of-time what subject you're searching for, as you don't have to navigate through collapsed folders in the sidebar.

Do we want to counteract the negatives somehow? Or document the pros/cons of using search vs tags & uplink trees?

This could actually be a "killer feature" that makes users jump to writing docs in Emanote instead of Hugo or Jekyll. Most programs are after all only as strong as their search feature.

</rant>

applejag avatar Jul 30 '22 19:07 applejag

Note to self: in addition to the UI changes in this PR, stork index generation should be done in postRun.

          STORK_TOML=./-/stork.toml
          STORK_INDEX=./-/stork.st
          echo '[input]' > $STORK_TOML
          echo 'files = [' >> $STORK_TOML
          grep '<title>' -A1 -r --exclude-dir '-' --include '*.html' | grep html- | sed 's/\(.*.html\)-  *\(.*\)/  {path = "\1", url="\1", title = """\2"""},/' | tee -a $STORK_TOML
          echo ']' >> $STORK_TOML
          /opt/stork build --input $STORK_TOML --output $STORK_INDEX

We want to use emanote's model internally rather than parsing HTML files. Also add stork as nix dependency and use staticWhich.

srid avatar Jul 30 '22 22:07 srid

@jilleJr Thanks! I'll hook the rest of the machinery up, and see if we can have emanote gen automate it all.

srid avatar Jul 30 '22 22:07 srid

Thanks! I'll hook the rest of the machinery up, and see if we can have emanote gen automate it all.

Thank you so much! I can create docs for this later (after this PR is merged, when Emanote's behavior on this is decided)

applejag avatar Jul 31 '22 09:07 applejag

Latest changes:

  • Added support for neuron-like note layout
  • Moved all CSS & JS to snippets
  • Moved searchbox to component instead of template hook, and make it visible via CSS snippet

Demo of neuron-like note layout with search box: https://jillejr.github.io/emanote/demo/neuron-layout.html

image image

applejag avatar Jul 31 '22 13:07 applejag

On second thought, this is now how I envision to enable search, where it's a 2-step process:

  1. Add command flag or YAML config value to make emanote create the /-/stork.st index file
  2. Add the <snippet var="js.stork-search" /> to your page.headHtml in index.yaml

applejag avatar Jul 31 '22 13:07 applejag

@jilleJr Give 12cf3a1 a try. The stork index is built behind the scenes in both live server and generate mode.

Note to self:

  • [ ] Performance must be bad; so we want to cache the index in memory instead of rebuilding it on every request.
  • [ ] Multiple layer (-L) scenario probably is broken here. Need to fix that.

srid avatar Aug 01 '22 21:08 srid

Design-wise, I'm not sure about the current style. Can we do something like the search UX in https://tailwindcss.com/ ?

Instead of a center search bar, we could have a small search icon somewhere not-too-prominent-but-yet-noticeable - but retain the Tailwind site's Cmd+K/Ctrl+k keybinding, along with the search results appearing on the center screen. The flat theme is not bad.

Also for some strange reason the search bar is not even rendering on certain pages (math and mermaid).

srid avatar Aug 01 '22 21:08 srid

Design-wise, I'm not sure about the current style. Can we do something like the search UX in https://tailwindcss.com/ ?

Instead of a center search bar, we could have a small search icon somewhere not-too-prominent-but-yet-noticeable - but retain the Tailwind site's Cmd+K/Ctrl+k keybinding, along with the search results appearing on the center screen. The flat theme is not bad.

Sure! I'll attempt something! I'll try out having a modal and a discrete search icon.

Also for some strange reason the search bar is not even rendering on certain pages (math and mermaid).

Ah yes, my bad. Those pages overwrite the page.headHtml. Fixed!

applejag avatar Aug 02 '22 05:08 applejag

Ah yes, my bad. Those pages overwrite the page.headHtml. Fixed!

Users shouldn't have to do this (dc82292bade41cf85b11d7758ad3888c4991a531), though. I think search should be a built-in feature, not just something users toggle on/off on individual pages. Maybe it is best to put them in .tpl files. I'll think about this more later.

srid avatar Aug 02 '22 14:08 srid

Users shouldn't have to do this (dc82292), though. I think search should be a built-in feature, not just something users toggle on/off on individual pages. Maybe it is best to put them in .tpl files. I'll think about this more later.

Sure, I've changed it back so it's only added via templates now.

But I can't run it locally, it fails to compile. I probably did something wrong when resolving the merge conflicts. @srid could you take a look?

src/Emanote.hs:53:23: error:
    • Couldn't match type ‘Ema.Asset.Asset LByteString’
                     with ‘Data.ByteString.Lazy.Internal.ByteString’
      Expected: m (SiteOutput SiteRoute)
        Actual: m (Ema.Asset.Asset (Ema.Asset.Asset LByteString))
    • In the expression: pure $ View.emanoteSiteOutput rp m r
      In an equation for ‘siteOutput’:
          siteOutput rp m r = pure $ View.emanoteSiteOutput rp m r
      In the instance declaration for ‘EmaSite SiteRoute’
   |
53 |   siteOutput rp m r = pure $ View.emanoteSiteOutput rp m r
   |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I'm waiting with making the design changes while I can't compile the program. Sorry, I'm pretty lost when it comes to Haskell stuff

applejag avatar Aug 13 '22 15:08 applejag

@jilleJr I've fixed it in 460a1f2

srid avatar Aug 13 '22 16:08 srid

@srid What about this now? https://jillejr.github.io/emanote/

image

image

I even did the blurred backdrop thing from tailwindcss' website :P

For the narrow layout it's using only the :mag: emoji when the sidebar is collapsed.

image

I was not really sure what to do with the neuron-like page, it currently steals a lot of the vertical view-space:

image

Thoughts?

applejag avatar Aug 13 '22 20:08 applejag

From an early adopter who knows very little about programming: I managed to install Emanote via 'nix profile --impure install github:jilleJr/emanote/feature/stork-search' on my Mac. Both gen and run create a website with search enabled, but nothing can be searched for. I get messages [Warn#emanote] Generating search index using Stork (this may be expensive) [Info#emanote] Done generating Stork index but stork.st is 0 bytes long in static version (and I suppose is also empty in the run mode).

When installing Emanote with stork on Mac, I need to use the option --impure. Without it, I get the message: Package ‘stork-1.4.2’ in /nix/store/a885zpv9ys2p2x7qnzqvxlsy321mclip-source/pkgs/applications/misc/stork/default.nix:29 is marked as broken, refusing to evaluate. Is stork broken for Mac? Or should I force using cache.garnix.io (I enabled it, per instructions, but still get the message "ignoring untrusted substituter 'https://cache.garnix.io'"

Bipodos avatar Aug 14 '22 12:08 Bipodos

From an early adopter who knows very little about programming: I managed to install Emanote via 'nix profile --impure install github:jilleJr/emanote/feature/stork-search' on my Mac. Both gen and run create a website with search enabled, but nothing can be searched for. I get messages [Warn#emanote] Generating search index using Stork (this may be expensive) [Info#emanote] Done generating Stork index but stork.st is 0 bytes long in static version (and I suppose is also empty in the run mode).

Interesting! And concerning!

I'm unable to reproduce this, as it seems to work just fine at least on the Emanote docs.

@Bipodos Which repo have you tested this on? Do you get the same result when checking out this branch and running the bin/run script?

Sample working (for me/on my machine):

(inside cloned https://github.com/jilleJr/emanote, branch feature/stork-search)
$ mkdir -p result
$ nix run github:jilleJr/emanote/feature/stork-search -- gen result -L docs --allow-broken-links

$ du -h result/-/stork.st
396K	result/-/stork.st
(^ not 0 bytes)

(also, can perform searches on it:)
$ stork search --index result/-/stork.st --query foo
{
  "results": [
    {
      "entry": {
        "url": "demo/markdown.html",
        "title": "Extended Markdown \\U0000270d\\U0000fe0f",
        "fields": {}
      },
      "excerpts": [
        {
          "text": "footnotes. You may also reuse[^1] footnotes. [^1]: First footnote example [^2]: Second footnote example. Footnotes within [^1] footnotes are not handled. Task",
          "highlight_ranges": [
...

Sidenote: However when playing around I encountered a different error when running stork on my own notes page (https://github.com/jilleJr/notes) where the resulting stork.st seems to have some issue (or there's some issue in Stork):

(inside cloned https://github.com/jilleJr/notes)
$ mkdir -p result
$ nix run github:jilleJr/emanote/feature/stork-search -- gen result -L content
$ du -h result/-/stork.st
748K	result/-/stork.st
(^ also not 0 bytes)

(but panics when performing searches on it:)
$ stork search --index result/-/stork.st --query foo
thread 'main' panicked at 'split_to out of bounds: 764782 <= 763741', /home/kalle/.cargo/registry/src/github.com-1ecc6299db9ec823/bytes-1.1.0/src/bytes.rs:402:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I'll bring up the above panicked error message over at Stork's GitHub repo.

applejag avatar Aug 14 '22 13:08 applejag

When installing Emanote with stork on Mac, I need to use the option --impure. Without it, I get the message: /..snip../ Is stork broken for Mac?

Stork is explicitly supported on Mac https://stork-search.net/docs/install, though I don't know if there's any Mac-related issues on Stork v1.5.0 or perhaps with the Nix-packaged version.

I don't have a Mac so I can't verify this. @srid do you?

applejag avatar Aug 14 '22 13:08 applejag

  1. I checked out github:jilleJr/emanote/feature/stork-search
  2. I run Emanote locally, after installing in line with instructions at https://emanote.srid.ca/start/install
  3. Stork is installed by Emanote in the nix profile, but by default, Emanote asks for 1.4.2. At Nixos stork package site one can see that 1.5.0 is treated as "unstable". In Nix package definition for 1.5.0 it is described as
    # TODO: Remove once nixpkgs uses macOS SDK 10.14+ for x86_64-darwin
    # Undefined symbols for architecture x86_64: "_SecTrustEvaluateWithError"
    broken = stdenv.isDarwin && stdenv.isx86_64;

I have an Intel MacBook Pro, this is the architecture described as broken. Still, even 1.4.2 is treated as broken Anyway, the installation of stork done via nix profile is visible only from the profile. I cannot run stork myself. Should one install stork separately?
4. I did it (just downloading a binary from stork, making executable, putting in a directory with path to it). It can be called manually, but Emanote installed as package produces the same 0 byte index. I shall try running directly from repository. 5. Anyway, can having lots of Greek in the vault something to do with it? Is stork unicode-aware? I am trying to move a project dictionary online

Bipodos avatar Aug 14 '22 13:08 Bipodos

OK just to try confirm that Stork may be working, could you do the following?

  1. Create file stork.toml

    [input]
    files = [
      {path = "example.txt", url = "0001", title = "Example"},
    ]
    
  2. Create file example.txt

    Lorem ipsum
    Alphabet: Α α, Β β, Γ γ, Δ δ, Ε ε, Ζ ζ, Η η, Θ θ, Ι ι, Κ κ, Λ λ, Μ μ, Ν ν, Ξ ξ, Ο ο, Π π, Ρ ρ, Σ σ/ς, Τ τ, Υ υ, Φ φ, Χ χ, Ψ ψ, Ω ω.
    
  3. Build the index via your downloaded Stork CLI (that you mentioned in your step 4.)

    $ ./stork build -i stork.toml -o stork.st
    Success: Index built successfully, wrote 1,171 bytes.
    Index stats:
      - 1 entries
      - 42 search terms
      - 1,171 bytes per entry
      - 27 bytes per search term
    
    $ du -h stork.st
    4.0K	stork.st
    
  4. Try building via the Stork Nix package:

    $ nix run "nixpkgs#stork" -- build -i stork.toml -o stork.st
    Success: Index built successfully, wrote 1,171 bytes.
    Index stats:
      - 1 entries
      - 42 search terms
      - 1,171 bytes per entry
      - 27 bytes per search term
    
    $ du -h stork.st
    4.0K	stork.st
    
  5. If both above succeed, just a sanity check that search also works:

    $ ./stork search -i stork.st -q alph
    {
      "results": [
        {
          "entry": {
            "url": "0001",
            "title": "Example",
            "fields": {}
          },
          "excerpts": [
            {
              "text": "Lorem ipsum Alphabet: Α α, Β β, Γ γ, Δ",
    ...
    
    $ nix run "nixpkgs#stork" -- search -i stork.st -q alph
    (same output)
    

applejag avatar Aug 14 '22 13:08 applejag

After adding stork independently from nix, and after running Emanote directly from repository, so as @jilleJr does, I got the same message about broken stork

a) To temporarily allow broken packages, you can use an environment variable
          for a single invocation of the nix tools.

            $ export NIXPKGS_ALLOW_BROKEN=1

        Note: For `nix shell`, `nix build`, `nix develop` or any other Nix 2.4+
        (Flake) command, `--impure` must be passed in order to read this
        environment variable.

       b) For `nixos-rebuild` you can set
         { nixpkgs.config.allowBroken = true; }
       in configuration.nix to override this.

       c) For `nix-env`, `nix-build`, `nix-shell` or any other Nix command you can add
         { allowBroken = true; }
       to ~/.config/nixpkgs/config.nix.
(use '--show-trace' to show detailed location information)

Works with -- impure after run But result is identical

du -h  stork.st                            
  0B	stork.st

Nb. I cannot cd to directory named "-" on Mac via cd - (goes to former directory of the terminal instead). I cannot read the file from an upper directory. Only showing full path in du works. If you are using abbreviated relative paths, may be some commands cannot find the file?

Bipodos avatar Aug 14 '22 13:08 Bipodos

Nb. I cannot cd to directory named "-" on Mac via cd - (goes to former directory of the terminal instead)

Small tip: you can do cd ./- or cd -/ instead. This trick works on most CLI programs that has - overridden, e.g wget -O - (targets STDOUT) vs wget -O ./- (targets file name -)

applejag avatar Aug 14 '22 13:08 applejag

@jilleJr, working on your instructions 1.2. done 3.:

Stork % stork build -i stork.toml -o stork.st
Success: Index built successfully, wrote 1,171 bytes.
Index stats:
  - 1 entries
  - 42 search terms
  - 1,171 bytes per entry
  - 27 bytes per search term
% du -h stork.st
4,0K	stork.st

This runs on a binary downloaded from stork github and put in /opt/local/bin, a location to which user can point PATH (/usr/bin/ is off limits in Mac now). 4.:

 % nix run "nixpkgs#stork" -- build -i stork.toml -o stork.st
error: Package ‘stork-1.5.0’ in /nix/store/0c10w4am1xhxjkrmnd8d6rbksvm68311-source/pkgs/applications/misc/stork/default.nix:29 is marked as broken, refusing to evaluate.

And a message as in a comment above

I do export NIXPKGS_ALLOW_BROKEN=1 and run nix run --impure "nixpkgs#stork" -- build -i stork.toml -o stork.st Takes some time, because does lot of download and compiles in /private/tmp/nix-build-stork-1.5.0.drv-0 Worked:

Success: Index built successfully, wrote 1,171 bytes.
Index stats:
  - 1 entries
  - 42 search terms
  - 1,171 bytes per entry
  - 27 bytes per search term

5.: checking search from binary in /opt/

 % stork search -i stork.st -q alph
{
  "results": [
    {
      "entry": {
        "url": "0001",
        "title": "Example",
        "fields": {}
      },
      "excerpts": [
        {
          "text": "Lorem ipsum Alphabet: Α α, Β β, Γ γ, Δ",
          "highlight_ranges": [
            {
              "beginning": 12,
              "end": 21
            }
          ],
          "score": 123,
          "internal_annotations": [],
          "fields": {}
        }
      ],
      "title_highlight_ranges": [],
      "score": 246
    }
  ],
  "total_hit_count": 1,
  "url_prefix": ""
}
```
Via nix:
```
nix run "nixpkgs#stork" -- search -i stork.st -q alph
error: Package ‘stork-1.5.0’ in /nix/store/0c10w4am1xhxjkrmnd8d6rbksvm68311-source/pkgs/applications/misc/stork/default.nix:29 is marked as broken, refusing to evaluate.
``` etc
with `--impure`

```
nix run --impure "nixpkgs#stork" -- search -i stork.st -q alph
{
  "results": [
    {
      "entry": {
        "url": "0001",
        "title": "Example",
        "fields": {}
      },
      "excerpts": [
        {
          "text": "Lorem ipsum Alphabet: Α α, Β β, Γ γ, Δ",
          "highlight_ranges": [
            {
              "beginning": 12,
              "end": 21
            }
          ],
          "score": 123,
          "internal_annotations": [],
          "fields": {}
        }
      ],
      "title_highlight_ranges": [],
      "score": 246
    }
  ],
  "total_hit_count": 1,
  "url_prefix": ""
}
```
Took a slice of second longer than the former approach.
So the problem seems to be elsewhere.


Bipodos avatar Aug 14 '22 14:08 Bipodos

This PR works on my M1 mac, but I don't have an intel mac to test (and we don't use it in CI; #335). In any case, I just upgraded stork to 1.5.0 in this PR. Not sure if that will help or not, @Bipodos

@jilleJr I like your recent design changes. I will have to think about where to put the search box as it is a bit jarring on the note layout definitely; even the book layout's position is not ideal. By the way, on mac Cmd+K should also work. I'm yet to review the .tpl files themselves.

srid avatar Aug 14 '22 14:08 srid

Btw I added a workaround in b308b8db for the issue I experienced earlier (https://github.com/EmaApps/emanote/pull/327#issuecomment-1214382184), basically changing:

stork build --output -

to

stork build --output /dev/stdout

By the way, on mac Cmd+K should also work.

On it!

applejag avatar Aug 14 '22 15:08 applejag

@Bipodos Try the latest PR (after 2b8ff25); it should work now on intel mac.

srid avatar Aug 14 '22 15:08 srid

Commit 0d96667 containing the added Cmd+K and Esc keybindings have been deployed to https://jillejr.github.io/emanote/

applejag avatar Aug 14 '22 15:08 applejag

I recompiled Emanote after changes through nix profile install github:jilleJr/emanote:feature/stork-search, also did nix run github:jilleJr/emanote/feature/stork-search -- gen "/Users/... -L "/Users/.../content/" Do not have to run with --impure s Zrzut ekranu 2022-08-14 o 18 27 13 witch anymore. Cmd-K works, Otherwise, index stork.st is 0 bytes long. Obviously, nothing can be found.

May be there is something wrong with the material to be indexed (not just Greek, but its ancient variant) or with index.yaml I will give the developers access to a copy of the source vault. If my wife lets me using her M1, may be I see whether this this processor architecture problem. Anyway, I do not suppose I could mess things up to this level that the site is created, just with 0 bytes index.

Bipodos avatar Aug 14 '22 16:08 Bipodos

Emojis are broken in the search results. The \U000... part.

image

srid avatar Aug 14 '22 16:08 srid

Emojis are broken in the search results:

Yep I noticed that too. Also some Markdown formatting is leaking through, such as the pipes in tables and the wiki-links.

Would it be possible/feasible to send the HTML-rendered pages to Stork instead of the markdown?

applejag avatar Aug 14 '22 16:08 applejag

Would it be possible/feasible to send the HTML-rendered pages to Stork instead of the markdown?

Not in live server (and we want Stork to work in live server the same way it does in statically generated site, as this PR currently demonstrates), because of incremental build (HTML is rendered only for the requested page, on demand).

This is why Emanote has to use Tailwind 2.x in live server, but Tailwind 3.0 in generated mode. 3.0 has no CDN, and forces the use of JIT compiler.

srid avatar Aug 14 '22 16:08 srid

@Bipodos Sad to say, but your project mostly just works fine on my machine.

I faced some issues, but resulting in a 0 byte stork.st file was never one of them.

The issues I faced was:

  • I had to set LANG=C.UTF-8 & LC_ALL=C.UTF-8 when running Emanote, otherwise it just complained about "invalid character" in the filenames.
  • Emanote was complaining HEAVILY about broken links ("Found 192 broken links!") for files that I can find when doing regular find for the filename.
  • The index doesn't seem to contain all files. Probably related to above.

The missing files is their own topic that deserves a separate GitHub issue, but from your project I could at least search in the file it did include, e.g here where "termin" could be found in 3 different files:

$ stork search --index result/-/stork.st --query termin
{
  "results": [
    /..snip../
  ],
  "total_hit_count": 3,
  "url_prefix": ""
}

For reference, I'm running on Intel, but on GNU/Linux.

applejag avatar Aug 14 '22 17:08 applejag