lucy.js
lucy.js copied to clipboard
use inverted index on smaller dataset for now
it is currently very slow to create the index on the current tweets.json. I would try it with a subset. Also the index takes a while to show up in the Chrome Dev Console, even when I refreshed it.
ah, for some reason, refreshing while within the console doesn't actually refresh the view, but when I closed the console and reopened it, it showed up immediately.
Keep in mind that console.log()
-ing every single item is probably making this way slower than it should.
oh yeah that was just for my debugging purposes. when i'm finished, I'll try again with the big dataset.
Oh, you’re making changes to invindex.js right now?
oh no, I was done for the night :) I will work on it tomorrow.
On Wed, Nov 26, 2014 at 2:13 AM, Lea Verou [email protected] wrote:
Oh, you’re making changes to invindex.js right now?
— Reply to this email directly or view it on GitHub https://github.com/amyxzhang/lucy.js/issues/3#issuecomment-64523486.
Amy X. Zhang | http://amyxz.com | @amyxzh
@manalinaik Is it normal that generating the prefix tree took 14 minutes here for 1/3 of the dataset of the repo? (I forgot how long it used to take, but I don't recall it taking that long before)
Yeah, it's gotten significantly slower. Inserting a large number of tweets asynchronously resulted in a lot of insertion errors because different threads would try to insert the same node in the prefix tree concurrently. And after an insertion error on a given key, subsequent calls to get on that key would also fail. I had to change the code to insert every tweet one at a time, which is really slow. But I couldn't find another way to avoid all the insertion errors without doing so.