performance-analyzer
performance-analyzer copied to clipboard
Substantial logspam when tmp directory is marked "noexec"
Hi there. We have just set up ELK at our organization using OpenDistro and have been very happy with the products so far. Unfortunately our organization has some pretty restrictive configs and we have noticed some possible bugs along the way due to that.
On our elasticsearch server, which is running RHEL 7.7, we have a tmp directory which is marked "noexec". We are unfortunately not allowed to change this configuration.
Upon starting the performance-analyzer-agent-cli, we observe the following output (from my test machine running centos 7):
[root@centos bin]# sudo -u elasticsearch ./performance-analyzer-agent-cli
ERROR StatusLogger File not found in file system or classpath: /plugins/opendistro_performance_analyzer/pa_config/log4j2.xml
ERROR StatusLogger Reconfiguration failed: No configuration found for 'c387f44' at 'null' in 'null'
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.bouncycastle.jcajce.provider.drbg.DRBG (file:/usr/share/elasticsearch/plugins/opendistro_performance_analyzer/bcprov-jdk15on-1.60.jar) to constructor sun.security.provider.Sun()
WARNING: Please consider reporting this to the maintainers of org.bouncycastle.jcajce.provider.drbg.DRBG
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
java.lang.UnsatisfiedLinkError: /tmp/sqlite-3.8.11.2-78cb63cd-201f-4fdc-a745-94faabde1693-libsqlitejdbc.so: /tmp/sqlite-3.8.11.2-78cb63cd-201f-4fdc-a745-94faabde1693-libsqlitejdbc.so: failed to map segment from shared object: Operation not permitted
14:20:04.329 [Thread-1] ERROR com.amazon.opendistro.elasticsearch.performanceanalyzer.PerformanceAnalyzerApp - Error in ReaderMetricsProcessor...restarting, ExceptionCode: ReaderRestartProcessing
14:20:04.331 [Thread-1] ERROR com.amazon.opendistro.elasticsearch.performanceanalyzer.PerformanceAnalyzerApp - Error in ReaderMetricsProcessor...restarting, ExceptionCode: ReaderRestartProcessing
14:20:04.331 [Thread-1] ERROR com.amazon.opendistro.elasticsearch.performanceanalyzer.PerformanceAnalyzerApp - Error in ReaderMetricsProcessor...restarting, ExceptionCode: ReaderRestartProcessing
14:20:04.331 [Thread-1] ERROR com.amazon.opendistro.elasticsearch.performanceanalyzer.PerformanceAnalyzerApp - Error in ReaderMetricsProcessor...restarting, ExceptionCode: ReaderRestartProcessing
14:20:04.331 [Thread-1] ERROR com.amazon.opendistro.elasticsearch.performanceanalyzer.PerformanceAnalyzerApp - Error in ReaderMetricsProcessor...restarting, ExceptionCode: ReaderRestartProcessing
14:20:04.331 [Thread-1] ERROR com.amazon.opendistro.elasticsearch.performanceanalyzer.PerformanceAnalyzerApp - Error in ReaderMetricsProcessor...restarting, ExceptionCode: ReaderRestartProcessing
14:20:04.331 [Thread-1] ERROR com.amazon.opendistro.elasticsearch.performanceanalyzer.PerformanceAnalyzerApp - Error in ReaderMetricsProcessor...restarting, ExceptionCode: ReaderRestartProcessing
14:20:04.331 [Thread-1] ERROR com.amazon.opendistro.elasticsearch.performanceanalyzer.PerformanceAnalyzerApp - Error in ReaderMetricsProcessor...restarting, ExceptionCode: ReaderRestartProcessing
This is followed by 1000s of similar "ReaderMetricsProcessor" errors every second which continues nonstop until the process is killed.
Unfortunately, due to a second bug which I will be filing in a moment, it is not possible to override the temp directory path by using the ES_TMPDIR or ES_JAVA_OPTS config variables like you might expect. I will post more details on that in another issue.
For now we have worked around the problem by editing the performance-analyzer-agent-cli script and changing the PA_AGENT_JAVA_OPTS
variable to additionally contain the string -Djava.io.tmpdir=/apps/elasticsearch/tmp
(/apps/ is a volume which our team controls).
Thanks @shawnz for reporting the issue. This issue should be addressed when we fix #71
Thanks again for looking into this. However I am a bit concerned that simply solving issue #71 is not enough to fully solve this issue as well. If somebody unknowingly puts themselves into this configuration without realizing that it's not supported, they could end up unknowingly creating a serious resource problem (the log file was 3 GB large by the time I figured out what was going on).
I think in addition to solving #71, it would be helpful if the behaviour were changed to output only one error message in this unsupported situation and then terminate, rather than constantly outputting endless error messages in a loop.
Right! The PerformanceAnalyzer App is perpetually trying to restart the processing thread on failure. We will make the appropriate change to either terminate the process or stop the processing thread.
Sounds good to me. Thank you all for your hard work on this project.