make search actually scale well
I implemented a command-line version of search for my open-source workflowy.
It even implements query operators like "or","and", and "not".
It depends on this npm module for self-balancing binary search trees.
hm, why binary search trees? i was thinking a reverse index, but I'm also considering whether it's worth it to make the search fuzzy
mostly orthogonal thought: one thing it occured to me you should be aware of is that my app, thus far, is entirely client-side. it uses localStorage for all persistence. might change one day
This is relatively important to me. Fuzzy search would be nice (maybe as a plugin someday? though I don't know what kind of API would realistically work for that), but good performance for the exact search would be very nice.
Not in js, but worth noting as the only fuzzy regex tool I know of: https://github.com/laurikari/tre
Does this sound feasible?
Separate the document into chunks, a chunk is a collection of rows that has up to k characters. We maintain a suffix array in each chunk over the concatenation of the rows in that chunk. To search for a query we check if its in every chunk. Runtime is O(n/k * log(k)) to search, O(k log(k)) to update after a row is changed. Memory is O(n), one extra integer for each character. n = number of characters, k = chunk size (around 1-10k)
I'd prefer to implement something that works well asymptotically. Your method saves a factor of 1K but not more, for very large N. I'm less concerned about auxiliary space, I think
You could set k=sqrt(n) :P Perhaps we could use a single dynamic suffix array? I think all operations are logarithmic. https://elar.urfu.ru/bitstream/10995/3710/3/RuSSIR_2011_04.pdf The limitation of this paper is text can only be inserted from the left, but thats fine in our case since we can remove the row and add it back. edit: that paper is very sparse on details and is probably flawed, but I'll think about it more
yeah, k=n sounds fine to me!
(oh wait.. does that make updates slow? that can be avoided probably right?)
@platers and i discussed offline a bit and a bigger issue is that the queries need to be mostly implemented server-side for efficiency. so firebase would need to use cloud functions, SQLite ideally a specialized sql query (ideally a stored procedure if using a different sql implementation)