markbind
markbind copied to clipboard
Auto-search content of pages
Current: only page titles and specified keywords in the frontmatter appear in search results.
Suggested: also include other content in pages for search results
Raising priority as full-text search can greatly enhance the usefulness of a content-heavy website.
I don't mind full-text search is a separate page altogether and takes some time to load (i.e., if the full search index needs to be downloaded to the Browser first)
Are we open to integrating existing solutions for full-text search?
Docsearch (free, open-source):
DocSearch will crawl your documentation website, push its content to an Algolia index, and allow you to add a dropdown search menu for your users to find relevant content in no time.
Are we open to integrating existing solutions for full-text search?
Ideally, we should have a decent built-in solution and the ability to integrate other third-party solutions.
As discussed with @marvinchin today, Marvin is planning to explore using the Lunrjs library to implement a built-in full text search. This library is also used by MkDocs.
Are we still looking to have built-in full text search for V2? 😅 I'm not sure that I can finish it by the end of the semester.
Are we still looking to have built-in full text search for V2? 😅 I'm not sure that I can finish it by the end of the semester.
Good to have, but not necessary. Same for the FOUC problem. Both have a good-enough workaround but not a full-fledged solution.
I've just published an almost year long project originally motivated by this issue:
- https://github.com/ang-zeyu/morsels
- https://ang-zeyu.github.io/morsels/
It consists of a cli file indexer (integratable by copying the binary similar to what we do for plantuml.jar
), a search library powered by wasm (rust), and search ui (typescript).
It deals with the issue in 2 aspects:
-
Scalability
I don't mind full-text search is a separate page altogether and takes some time to load (i.e., if the full search index needs to be downloaded to the Browser first)
I found this issue to be common to many static site generators using lunrjs / some other client side search solution; This was my primary motivation in creating this project (see
https://github.com/olivernn/lunr.js/issues/222
and discussion herehttps://github.com/rust-lang/mdBook/issues/51
for example), although it turned out to be a secondary plus in the end.The primary approach / difference here as such is fragmenting the index into many separate files; At search time, only files needed (by what's searched) are retrieved.
The indexer is also created in rust as such (:star: indexes the entire 2103 site in
0.5s
!). As well as the search library (wasm using rust). Alternative js-based implementations were also trialed and tuned for both; The performance differences are significant.This does mean a relatively larger binary / bundle size (
334KB
gzipped wasm file), something I'm still working to improve (the silver lining is that search hopefully isn't the first thing (within 1-2s of page load) users activate) -
A complete e2e search solution
Due to minor implications of scalability in the internal design, I also ended up creating an entire search user interface library. To my knowledge there aren't many "complete" (indexer -> search library -> ui) solutions around (barring algolia docsearch which is an entirely different beast).
Haven't really marketed it as I'm still tying up some things (e2e tests, getting windows defender to stop flagging the executables as viruses, some more bugs), but could look into integrating it here sometime 😃.
I've just published an almost year long project originally motivated by this issue:
- https://github.com/ang-zeyu/morsels
- https://ang-zeyu.github.io/morsels/
Nice work @ang-zeyu Let's aim to integrate it to MarkBind in due course.
I'm increasing the priority because Algolia DocSearch is undergoing a major revamp and they haven't been able to provide the search support for our module websites this semester so far. The sooner we reduce reliance on third-party search the better.
If anyone would like to take up this issue, please feel free, I think this would be a rather fun thing to do. The library I mentioned above is more or less ready for use. I am currently just doing a fun infinite loop of "making it better and more marketable" but not actually doing any marketing 🤔😅
I came across several related alternatives as well in the course of doing this as well you can consider. All of them follow a CLI + wasm frontend architecture:
- Stork
- TinySearch
- Pagefind - very recent. closest cousin of the library above. It implements the same idea of sharding index files. Currently, the main reason you might not want to use this is the downside of extra network request latency (it does not have the option of not-fragmenting index files), whereas my library by default does not shard index files but offers said option, the reason being to cater to the larger majority of use cases which do not need sharding. (includes 2103 site which generates just ~3MB index) This greatly improves search latency.
Please don't let my selling here from stop you from exercising your own judgement as well. Feel free to come to your own reasoning, and choice, and post back here. I would love to hear your thoughts.
Some non exhaustive guidelines for implementation:
- Consider how to map our current keywords feature to the new solutions
- We currently don't delete files when they have been removed during live preview. This will likely be necessary for any file based indexing solution you choose to maintain the "state" of content accurately.
- Obviously, the UI component needs to be adapted
- Old header indexing code should be removed
- ❔
Hello I've been looking at this issue and one problem I've encountered is how contents in components that are hidden to the user during the initial render (e.g. Panels) are not included in the search results. This is because libraries like Pagefind indexes the content only after the HTML files have been built. This rendering problem is also faced by other plugins like dataTable (@Tim-Siu) and Mermaid (@yiwen101 @LamJiuFong)
This behaviour is also similar to the Algolia DocSearch we use now that automatically adds algolia-no-index
to content hidden by MarkBind's Vue components, causing content hidden in panels to similarly not show up in search results.
With this in mind, I'm just making sure if the behaviour of the results of the full text search we want to implement should include content that are included in panels, or it is ok for them to not show up in the search results
This behaviour is also similar to the Algolia DocSearch we use now that automatically adds
algolia-no-index
to content hidden by MarkBind's Vue components, causing content hidden in panels to similarly not show up in search results.With this in mind, I'm just making sure if the behaviour of the results of the full text search we want to implement should include content that are included in panels, or it is ok for them to not show up in the search results
@jingting1412 I think it is fine (even necessary) to omit content from collapsed panels. But we can index content from expanded-by-default panels, right?