lucy.js icon indicating copy to clipboard operation
lucy.js copied to clipboard

use inverted index on smaller dataset for now

Open amyxzhang opened this issue 10 years ago • 7 comments

it is currently very slow to create the index on the current tweets.json. I would try it with a subset. Also the index takes a while to show up in the Chrome Dev Console, even when I refreshed it.

amyxzhang avatar Nov 26 '14 05:11 amyxzhang

ah, for some reason, refreshing while within the console doesn't actually refresh the view, but when I closed the console and reopened it, it showed up immediately.

amyxzhang avatar Nov 26 '14 05:11 amyxzhang

Keep in mind that console.log()-ing every single item is probably making this way slower than it should.

LeaVerou avatar Nov 26 '14 07:11 LeaVerou

oh yeah that was just for my debugging purposes. when i'm finished, I'll try again with the big dataset.

amyxzhang avatar Nov 26 '14 07:11 amyxzhang

Oh, you’re making changes to invindex.js right now?

LeaVerou avatar Nov 26 '14 07:11 LeaVerou

oh no, I was done for the night :) I will work on it tomorrow.

On Wed, Nov 26, 2014 at 2:13 AM, Lea Verou [email protected] wrote:

Oh, you’re making changes to invindex.js right now?

— Reply to this email directly or view it on GitHub https://github.com/amyxzhang/lucy.js/issues/3#issuecomment-64523486.

Amy X. Zhang | http://amyxz.com | @amyxzh

amyxzhang avatar Nov 26 '14 07:11 amyxzhang

@manalinaik Is it normal that generating the prefix tree took 14 minutes here for 1/3 of the dataset of the repo? (I forgot how long it used to take, but I don't recall it taking that long before)

LeaVerou avatar Dec 07 '14 12:12 LeaVerou

Yeah, it's gotten significantly slower. Inserting a large number of tweets asynchronously resulted in a lot of insertion errors because different threads would try to insert the same node in the prefix tree concurrently. And after an insertion error on a given key, subsequent calls to get on that key would also fail. I had to change the code to insert every tweet one at a time, which is really slow. But I couldn't find another way to avoid all the insertion errors without doing so.

manalinaik avatar Dec 07 '14 15:12 manalinaik