Lambda-Serverless-Search icon indicating copy to clipboard operation
Lambda-Serverless-Search copied to clipboard

Save indexes directly to Lambda Function

Open rlingineni opened this issue 6 years ago • 5 comments

You will have to update the indexing function to store the indexes directly in S3 Bucket where the Lambda function is stored.

It can almost ~2 seconds to get all the virtual indexes from S3, but considering, each file is only about ~1MB, if we save the index onto the lambda function directly, we can shave that time off.

Of course, this could make the architecture a bit dirty, but performance gains will be great.

rlingineni avatar Sep 30 '18 21:09 rlingineni

Hi,

Another way to speed up the search might be to use S3 Select https://aws.amazon.com/blogs/aws/s3-glacier-select/

This reduces the need to fetch the whole indexes.

Cheers, Hans

seriousme avatar Oct 20 '18 07:10 seriousme

We would still have to load an entire index into the function's memory since we don't want just a subset of an index.

rlingineni avatar Oct 20 '18 07:10 rlingineni

It might work for larger datasets but then you need to alter the query algorithm as well. Standard lunrjs would not be able to work with that.

seriousme avatar Oct 20 '18 09:10 seriousme

btw: if updates are infrequent (e.g. only during nightly batches) and the index does not need to be super current then you might include the index with the lambda bundle so that with every time the index is updated a new version of the lambda is deployed.

seriousme avatar Oct 20 '18 09:10 seriousme

Right, yeah that's what I was thinking. Upload it with the lambda bundle. Even if it was frequent, I don't think it would matter. It doesn't cost us anything to update Lambda functions, and usually, from experience, a new bundle doesn't mean downtime.

As far as lunrjs goes, I agree, changes should be made to the core. There should be a way in lunrjs to load multiple indexes for a server-side user case.

rlingineni avatar Oct 23 '18 04:10 rlingineni