vimflowy icon indicating copy to clipboard operation
vimflowy copied to clipboard

make search actually scale well

Open WuTheFWasThat opened this issue 10 years ago • 10 comments

WuTheFWasThat avatar Jul 10 '15 04:07 WuTheFWasThat

I implemented a command-line version of search for my open-source workflowy.

It even implements query operators like "or","and", and "not".

It depends on this npm module for self-balancing binary search trees.

CurtisSV avatar Jul 15 '15 14:07 CurtisSV

hm, why binary search trees? i was thinking a reverse index, but I'm also considering whether it's worth it to make the search fuzzy

mostly orthogonal thought: one thing it occured to me you should be aware of is that my app, thus far, is entirely client-side. it uses localStorage for all persistence. might change one day

WuTheFWasThat avatar Jul 15 '15 17:07 WuTheFWasThat

This is relatively important to me. Fuzzy search would be nice (maybe as a plugin someday? though I don't know what kind of API would realistically work for that), but good performance for the exact search would be very nice.

paulfchristiano avatar Oct 29 '15 23:10 paulfchristiano

Not in js, but worth noting as the only fuzzy regex tool I know of: https://github.com/laurikari/tre

za3k avatar Oct 30 '15 21:10 za3k

Does this sound feasible?

Separate the document into chunks, a chunk is a collection of rows that has up to k characters. We maintain a suffix array in each chunk over the concatenation of the rows in that chunk. To search for a query we check if its in every chunk. Runtime is O(n/k * log(k)) to search, O(k log(k)) to update after a row is changed. Memory is O(n), one extra integer for each character. n = number of characters, k = chunk size (around 1-10k)

platers avatar Dec 30 '20 21:12 platers

I'd prefer to implement something that works well asymptotically. Your method saves a factor of 1K but not more, for very large N. I'm less concerned about auxiliary space, I think

WuTheFWasThat avatar Dec 30 '20 22:12 WuTheFWasThat

You could set k=sqrt(n) :P Perhaps we could use a single dynamic suffix array? I think all operations are logarithmic. https://elar.urfu.ru/bitstream/10995/3710/3/RuSSIR_2011_04.pdf The limitation of this paper is text can only be inserted from the left, but thats fine in our case since we can remove the row and add it back. edit: that paper is very sparse on details and is probably flawed, but I'll think about it more

platers avatar Dec 30 '20 23:12 platers

yeah, k=n sounds fine to me!

WuTheFWasThat avatar Jan 06 '21 03:01 WuTheFWasThat

(oh wait.. does that make updates slow? that can be avoided probably right?)

WuTheFWasThat avatar Jan 06 '21 03:01 WuTheFWasThat

@platers and i discussed offline a bit and a bigger issue is that the queries need to be mostly implemented server-side for efficiency. so firebase would need to use cloud functions, SQLite ideally a specialized sql query (ideally a stored procedure if using a different sql implementation)

WuTheFWasThat avatar Jan 09 '21 02:01 WuTheFWasThat