Examine icon indicating copy to clipboard operation
Examine copied to clipboard

Reading from file taking too long time inside container

Open sayeduzzamancuet opened this issue 2 years ago • 4 comments

We deployed our index inside docker container, .net core 3.1 Host os is debian We placed the index file in a shared mounted directory. The problem is accessing the index files taking huge time(60 seconds) for each request. We checked,.index size is only a few MB in size with around 200K small documents. Interestingly the same thing takes 120ms in development environment running on windows os. Am i missing something? @Shazwazza

sayeduzzamancuet avatar Nov 18 '21 13:11 sayeduzzamancuet

Lucene does not work very well over network shares. This is a limitation of lucene and something that can't just be 'fixed'. You could use change the directory to be a lucene in memory index only but it means it will not be persisted.

Shazwazza avatar Nov 22 '21 01:11 Shazwazza

@Shazwazza is there any way to store the index inside the database or Redis cache? So that instead of reading from storage, we can read it from cache memory?

sayeduzzamancuet avatar Nov 24 '21 02:11 sayeduzzamancuet

Our scenario is, We have a process that will run once a day to build the index. After the index is ready, it will be stored in a shared folder and another API will read it to serve the user request. This is a very simple approach and it should be served without any limitations by lucene.

sayeduzzamancuet avatar Nov 24 '21 02:11 sayeduzzamancuet

A few things of things of note.

Default Directory

The default directory is dependent upon OS.

image

That is, if you call FSDirectory.Open() to get the instance of a directory on Debian, it will default to NIOFSDirectory. I would recommend you try one of the other directories.

"Sharing" an Index

Since Lucene's built-in directories are optimized to work with a local disk, the way you share an index between multiple applications is to use Lucene.Net.Replicator. I don't know much about Examine, but I see that there is a reference to Lucene.Net.Replicator, so I suspect there is a way to utilize it.

Note that the timeout has a bug where it is set to only 1 second instead of 60 seconds, so you will need to adjust the timeout to make it work in any current release (the patch will be in 4.8.0-beta00016). See https://github.com/apache/lucenenet/pull/534.

Basically, it allows you to publish an index to a central location, and individual replication clients handle copying the index to each local disk for optimal performance.

Custom Directories

The Directory is just an abstraction and it is also possible to make custom implementations if the built-in options don't meet your needs. There are also some 3rd party implementations, such as AzureDirectory that you can either use or analyze to determine how to make your own.

NightOwl888 avatar Nov 24 '21 03:11 NightOwl888