elasticsearch-plugin-bundle icon indicating copy to clipboard operation
elasticsearch-plugin-bundle copied to clipboard

Baseform: memory optimization

Open ThaDafinser opened this issue 7 years ago • 5 comments

When i use the baseform plugin for some (> 1.000.000) documents, i'm getting this error

[2017-04-06T07:28:07,712][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [ultimate-1] fatal error in thread [elasticsearch[ultimate-1][clusterService#updateTask][T#1]], exiting
java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOf(Arrays.java:3236) ~[?:1.8.0_121]
	at org.xbib.elasticsearch.common.fsa.FSABuilder.expandBuffers(FSABuilder.java:468) ~[?:?]
	at org.xbib.elasticsearch.common.fsa.FSABuilder.serialize(FSABuilder.java:418) ~[?:?]
	at org.xbib.elasticsearch.common.fsa.FSABuilder.freezeState(FSABuilder.java:352) ~[?:?]
	at org.xbib.elasticsearch.common.fsa.FSABuilder.add(FSABuilder.java:204) ~[?:?]
	at org.xbib.elasticsearch.common.fsa.Dictionary.loadLines(Dictionary.java:43) ~[?:?]
	at org.xbib.elasticsearch.index.analysis.baseform.BaseformTokenFilterFactory.createDictionary(BaseformTokenFilterFactory.java:39) ~[?:?]
	at org.xbib.elasticsearch.index.analysis.baseform.BaseformTokenFilterFactory.<init>(BaseformTokenFilterFactory.java:27) ~[?:?]
	at org.xbib.elasticsearch.plugin.bundle.BundlePlugin$$Lambda$379/386311625.get(Unknown Source) ~[?:?]
	at org.elasticsearch.index.analysis.AnalysisRegistry.buildMapping(AnalysisRegistry.java:361) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.index.analysis.AnalysisRegistry.buildTokenFilterFactories(AnalysisRegistry.java:171) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.index.analysis.AnalysisRegistry.build(AnalysisRegistry.java:155) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.index.IndexService.<init>(IndexService.java:145) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.index.IndexModule.newIndexService(IndexModule.java:363) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.indices.IndicesService.createIndexService(IndicesService.java:427) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:392) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService$1.execute(MetaDataCreateIndexService.java:364) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:45) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.cluster.service.ClusterService.executeTasks(ClusterService.java:679) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.cluster.service.ClusterService.calculateTaskOutputs(ClusterService.java:658) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.cluster.service.ClusterService.runTasks(ClusterService.java:617) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.cluster.service.ClusterService$UpdateTask.run(ClusterService.java:1117) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:544) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:238) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:201) ~[elasticsearch-5.3.0.jar:5.3.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_121]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_121]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]

ThaDafinser avatar Apr 06 '17 05:04 ThaDafinser

Thanks for the report.

It's not a leak but the FSA is quite memory hungry when an index is created, the way it is implemented. Will be investigated to reduce memory size.

jprante avatar Apr 06 '17 07:04 jprante

I just tried now building a lot of indices with the settings below.

Even without using the filter explicit, the memory seems to be required. (got the same exception)

//update i'm going to recreate now all indices without the baseform filter defined and watch if then the plugin doesn't crash ES.

image

ThaDafinser avatar Apr 11 '17 06:04 ThaDafinser

Without using the baseform (removed also the defined filter), it still seems to be a memory problem.

@jprante i only created in toal 24MB of indices/documents, but JVM is full and Kibana goes again to timeouts.

image image

ThaDafinser avatar Apr 11 '17 13:04 ThaDafinser

After disabling the whole plugin, the JVM memory usage is stable.

Quick idea: load FSA with ES startup? (only once)

image

ThaDafinser avatar Apr 12 '17 08:04 ThaDafinser

@jprante i'm sadly no java geek, but i found at the ES repo this approach for Hunspell. They use a service, so the dictionary is only loaded once.

Maybe this would ge a good idea?

https://github.com/elastic/elasticsearch/blob/ee802ad63c0f21d697a5095dd05dc6f94626ee4d/core/src/main/java/org/elasticsearch/index/analysis/HunspellTokenFilterFactory.java#L44 https://github.com/elastic/elasticsearch/blob/ee802ad63c0f21d697a5095dd05dc6f94626ee4d/core/src/main/java/org/elasticsearch/indices/analysis/HunspellService.java

ThaDafinser avatar Apr 27 '17 08:04 ThaDafinser