cherrytree
cherrytree copied to clipboard
Index - Apache Lucene or Solar
Docfetcher core library is using Apache Lucene Index and that's the reason it is faster as said in the web-sites.
It would be great if cherrytree uses or implement recent indexing third party libraries to speed up the search when the content is huge.
Why not use SQLite FTS5 Extension -https://sqlite.org/fts5.html Currently experimenting with - most of the indexing/searching are based on the BM25 algorithm anyway
FTS5 is an SQLite virtual table module that provides full-text search functionality to database applications. In their most elementary form, full-text search engines allow the user to efficiently search a large collection of documents for the subset that contain one or more instances of a search term. The search functionality provided to world wide web users by Google is, among other things, a full-text search engine, as it allows users to search for all documents on the web that contain, for example, the term "fts5".
I've been trying out fts5 with the CherryTree database. (Well, I made a copy and messed with that!)
I got some encouraging results.
My simple, preliminary test was:
- Create the virtual table with content based upon the node table:
create virtual table vnode using fts5(node_id, name, txt, content = 'node');
- Update its index (so it's not empty):
insert into vnode(vnode) values('rebuild');
- Do a search:
select * from vnode('sheet cheat') order by rank limit 3;
This seemed to return something sensible. The top hit was a note where I'd mentioned cheat sheets a few times, for example. (I deliberately searched for "sheet cheat" instead of "cheat sheet" to make sure it would tokenise sensibly.)
Perhaps this would be a relatively simple way of implementing a full text search in CherryTree?