cherrytree icon indicating copy to clipboard operation
cherrytree copied to clipboard

Index - Apache Lucene or Solar

Open prog20901 opened this issue 7 years ago • 2 comments

Docfetcher core library is using Apache Lucene Index and that's the reason it is faster as said in the web-sites.

It would be great if cherrytree uses or implement recent indexing third party libraries to speed up the search when the content is huge.

prog20901 avatar Aug 03 '17 03:08 prog20901

Why not use SQLite FTS5 Extension -https://sqlite.org/fts5.html Currently experimenting with - most of the indexing/searching are based on the BM25 algorithm anyway

FTS5 is an SQLite virtual table module that provides full-text search functionality to database applications. In their most elementary form, full-text search engines allow the user to efficiently search a large collection of documents for the subset that contain one or more instances of a search term. The search functionality provided to world wide web users by Google is, among other things, a full-text search engine, as it allows users to search for all documents on the web that contain, for example, the term "fts5".

ghost avatar Oct 15 '17 06:10 ghost

I've been trying out fts5 with the CherryTree database. (Well, I made a copy and messed with that!)

I got some encouraging results.

My simple, preliminary test was:

  1. Create the virtual table with content based upon the node table: create virtual table vnode using fts5(node_id, name, txt, content = 'node');
  2. Update its index (so it's not empty): insert into vnode(vnode) values('rebuild');
  3. Do a search: select * from vnode('sheet cheat') order by rank limit 3;

This seemed to return something sensible. The top hit was a note where I'd mentioned cheat sheets a few times, for example. (I deliberately searched for "sheet cheat" instead of "cheat sheet" to make sure it would tokenise sensibly.)

Perhaps this would be a relatively simple way of implementing a full text search in CherryTree?

rmwiseman avatar Oct 24 '23 12:10 rmwiseman