mystmd
mystmd copied to clipboard
Support Search in MyST Sites
Search is currently not enabled in mystjs sites, but certainly something we want!
A few options:
- use a service like algolia
- do something locally at build time with an index (e.g. https://github.com/nextapps-de/flexsearch)
- related issue: https://github.com/executablebooks/jupyter-book/issues/815
It would be nice for offline/local search to work, so this is likely plug-able, and can be taken over by the theme, deployment or site config.
References
- Transfered from: https://github.com/curvenote/curvenote/issues/336
- discussion post asking for this: https://github.com/orgs/executablebooks/discussions/1161
interesting package in this comment https://github.com/executablebooks/jupyter-book/issues/815 and the issue mentioned above
Idea: search across MyST sites in intersphinx-style config?
One pain point of many organizations is that they host their documentation in multiple places, but link to them from a single place. For example, the 2i2c documentation the Dask documentation and our myst-tools site all have a topbar that links across pages that are hosted in different repositories.
One confusion point with this is that the scope of search (in Sphinx) is restricted to the currently-viewed sub-site, while I think many users expect a single search to work across all sites.
I wonder if it is possible to use configuration similar to intersphinx to pull in the search registries of other MyST websites, and include them in the search index for the currently-viewed site (e.g. either as a build-time operation, a server-side operation, or a client-side operation). Might be unrealistic to pull all of the text from those sites, but maybe if we store a registry of keywords and pages that would be enough?
Out of scope for this ticket, but to keep in the back of our mind: RAG (Retrieval Augmented Generation) is getting traction these days. In short: you chat with a generative AI chatbot (e.g. chat-GPT), and you would like it to answer questions in the context of some given collection of documents. How does it work? Well, pretty much like for search engines by prebuilding an "index": retraining the chatbot on the documents, or sending the documents together with each question would indeed be too costly. So instead once for all you split the document in chunks, and compute a vector (an embedding of the chunk as vector in some space) for each chunk that summarizes what the chunk is about (the equivalent of an index). Then, upon asking a question, a vector is built for your question and matched against the vectors of all the chunks to retrieve these that could be related. Finally, these chunks are fed back as context to the chatbot.
So we can foresee that, in some future, building and publishing a collection of vectors would be as much a standard part of the build process of a myst web site than building and publishing an index for it. As well as providing cross-site vectors, tailored chat bots, etc.
And it might be not that distant future. A student of mine is right now building a proof of concept of chatbot tailored to my course notes, and so is a student of a colleague.
Closed by #1530 and jupyter-book/myst-theme#470