opengrok icon indicating copy to clipboard operation
opengrok copied to clipboard

Docker start.py receives spurious SIGTERM signal

Open simranjeetc opened this issue 3 years ago • 29 comments

Describe the bug I have a project jdk8-b120 which is causing opengrok server to shutdown.

I want to disable the sync and reindex for this project. Please find mirror.yml below

# Empty config file for opengrok-mirror
ignore_errors: true

projects:
  jdk8.*:
    disabled: true
    incoming_check: false

Despite doing this I still see reindex getting triggered(screenshot below) for this project. What am I missing here? Also, how I can find what is causing the shutdown? Can suggester be the reason for the same?

I have used 1.4 for quite sometime and never had any such issues.

Screenshots image

simranjeetc avatar Jun 24 '21 06:06 simranjeetc

What do you mean by 'shutdown' exactly ?

vladak avatar Jun 24 '21 08:06 vladak

What do you mean by 'shutdown' exactly ?

In my docker-compose file(below) you can find that I had to use restart: always option because the container was coming down. Upon looking at the logs(attached), I found that tomcat was starting time and again as I find Starting Tomcat in logs everytime it does.

Let me know if you need any other info for the same.

Please note that I have also done some manual restarts. So a few of those would be present in the attached logs as well.

opengrok-log.tar.gz

services:
  opengrok-1-7-11:
    container_name: opengrok-1.7.11
    image: opengrok/docker:1.7.11
    restart: always
    volumes:
      - 'opengrok_data1_7_11:/opengrok/data'
      - './src/:/opengrok/src/'  # source code
      - './etc/:/opengrok/etc/'  # folder contains configuration.xml
      - '/etc/localtime:/etc/localtime:ro'
      - './logs/:/usr/local/tomcat/logs/'
    ports:
      - "9090:8080/tcp"
      - "5001:5000/tcp"
    environment:
      SYNC_PERIOD_MINUTES: '30'
      INDEXER_OPT: '-H -P -G -R /opengrok/etc/read_only.xml'
    # Volumes store your data between container upgrades
volumes:
  opengrok_data1_7_11:
networks:
  opengrok-1-7-11:

simranjeetc avatar Jun 24 '21 08:06 simranjeetc

It seems all the occurrences in the log when Tomcat is going down are preceded by:

Received signal 15
Terminating Tomcat <subprocess.Popen object at 0x7f338f047fd0>

which means something sent the SIGTERM signal to the main container process, most probably externally. I don't think this is related to the processed data.

vladak avatar Jun 24 '21 13:06 vladak

For the record, I noticed the same thing when setting up http://demo.opengrok.org (#1740) however I was not sure whether this was caused by watchtower or Docker itself.

vladak avatar Jun 24 '21 13:06 vladak

It seems all the occurrences in the log when Tomcat is going down are preceded by:

Received signal 15
Terminating Tomcat <subprocess.Popen object at 0x7f338f047fd0>

which means something sent the SIGTERM signal to the main container process, most probably externally. I don't think this is related to the processed data.

On running below commands

docker container logs 2ef 2>&1 | grep -B 100 -i "Starting tomcat" | grep -i "Received signal 15" | uniq -c
Output::      `7 Received signal 15`

You can ignore these as they would refer to docker-compose restarts issued by me.

docker container logs 2ef 2>&1 | grep -B 100 -i "Starting tomcat" | grep -i "Exception in thread Sync thread" | uniq -c
Output::      `5 Exception in thread Sync thread:`

It is these instances that I am more concerned about. In attached logs, the count would be 4. It is 5 for me since the tomcat shutdown again since the time I shared those logs.

And all of these have a suggester exception preceding them. As can be seen in the logs, it is happening for different projects and not the same one every time.

23-Jun-2021 04:06:01.623 INFO [Thread-20] org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler ["http-nio-8080"] 23-Jun-2021 04:06:01.631 INFO [Thread-20] org.apache.catalina.core.StandardService.stopInternal Stopping service [Catalina] 23-Jun-2021 04:06:01.642 SEVERE [ForkJoinPool-122-worker-3] org.opengrok.suggest.Suggester.lambda$getRebuildRunnable$2 Could not rebuild suggester net.openhft.chronicle.hash.ChronicleHashClosedException: Access to ChronicleMap{name=defs, file=/opengrok/data/suggester/jdk8-b120/defs_search_count.db, identityHash Code=817420606} after close() at net.openhft.chronicle.map.VanillaChronicleMap.q(VanillaChronicleMap.java:431) at net.openhft.chronicle.map.VanillaChronicleMap.mapContext(VanillaChronicleMap.java:443) at net.openhft.chronicle.map.VanillaChronicleMap.optimizedGet(VanillaChronicleMap.java:579) at net.openhft.chronicle.map.VanillaChronicleMap.get(VanillaChronicleMap.java:567) at java.base/java.util.concurrent.ConcurrentMap.getOrDefault(ConcurrentMap.java:88) at org.opengrok.suggest.popular.impl.chronicle.ChronicleMapAdapter.get(ChronicleMapAdapter.java:62) at org.opengrok.suggest.SuggesterProjectData.lambda$getSearchCounts$2(SuggesterProjectData.java:476) at org.opengrok.suggest.SuggesterProjectData$WFSTInputIterator.weight(SuggesterProjectData.java:603) at org.apache.lucene.search.suggest.SortedInputIterator.sort(SortedInputIterator.java:184) at org.apache.lucene.search.suggest.SortedInputIterator.(SortedInputIterator.java:76) at org.apache.lucene.search.suggest.SortedInputIterator.(SortedInputIterator.java:62) at org.apache.lucene.search.suggest.fst.WFSTCompletionLookup$WFSTInputIterator.(WFSTCompletionLookup.java:273) at org.apache.lucene.search.suggest.fst.WFSTCompletionLookup.build(WFSTCompletionLookup.java:115) at org.opengrok.suggest.SuggesterProjectData.build(SuggesterProjectData.java:266) at org.opengrok.suggest.SuggesterProjectData.build(SuggesterProjectData.java:253) at org.opengrok.suggest.SuggesterProjectData.rebuild(SuggesterProjectData.java:238) at org.opengrok.suggest.Suggester.lambda$getRebuildRunnable$2(Suggester.java:347) at java.base/java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1407) at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) 23-Jun-2021 04:06:01.650 INFO [ForkJoinPool.commonPool-worker-15] org.opengrok.suggest.Suggester.shutdownAndAwaitTermination Suggesters for [jdk8-b120] were successfully rebuilt (took 0 seconds) 23-Jun-2021 04:06:01.749 INFO [Thread-20] org.opengrok.indexer.configuration.WatchDogService.stop Watchdog stopped 23-Jun-2021 04:06:01.767 SEVERE [Thread-20] org.apache.catalina.loader.WebappClassLoaderBase.checkThreadLocalMapForLeaks The web application [ROOT] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@2e0b17a2]) and a value of type [net.openhft.chronicle.hash.impl.ContextHolder] (value [net.openhft.chronicle.hash.impl.ContextHolder@3d4211e1]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak. 23-Jun-2021 04:06:01.805 INFO [Thread-20] org.apache.coyote.AbstractProtocol.stop Stopping ProtocolHandler ["http-nio-8080"] 23-Jun-2021 04:06:01.836 INFO [Thread-20] org.apache.coyote.AbstractProtocol.destroy Destroying ProtocolHandler ["http-nio-8080"] could not get configuration from web application on http://localhost:8080/ Exception in thread Sync thread:

simranjeetc avatar Jun 24 '21 14:06 simranjeetc

I don't see the Exception in thread Sync thread in the attached log file. Could you share the relevant snippet ?

vladak avatar Jun 24 '21 14:06 vladak

Used this to narrow down to the snippets around Exception in thread sync thread

cat opengrok-v2.log| grep -i -B 1000 "Starting tomcat" | grep -i -A 50 -B 100 "exception in thread sync thread" | less

simranjeetc avatar Jun 24 '21 17:06 simranjeetc

The exception in the Sync thread was fixed in #3651 and the fix is part of 1.7.13 however this is only a symptom of the problem. The exception while saving configuration to disk happened because it was not possible to retrieve the configuration from the web app which in turn failed because Tomcat was going down for some reason.

vladak avatar Jun 24 '21 18:06 vladak

25-Jun-2021 08:51:18.228 INFO [ForkJoinPool.commonPool-worker-11] org.opengrok.suggest.Suggester.rebuild Rebuilding the following suggesters: [verity]
25-Jun-2021 08:51:30.770 INFO [Thread-20] org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler ["http-nio-8080"]
25-Jun-2021 08:51:30.779 INFO [Thread-20] org.apache.catalina.core.StandardService.stopInternal Stopping service [Catalina]
25-Jun-2021 08:51:30.878 INFO [Thread-20] org.opengrok.indexer.configuration.WatchDogService.stop Watchdog stopped
25-Jun-2021 08:51:30.963 SEVERE [ForkJoinPool-274-worker-3] org.opengrok.suggest.Suggester.lambda$getRebuildRunnable$2 Could not rebuild suggester
        net.openhft.chronicle.hash.ChronicleHashClosedException: Access to ChronicleMap{name=refs, file=/opengrok/data/suggester/verity/refs_search_count.db, identityHashCod
e=752444000} after close()
                at net.openhft.chronicle.map.VanillaChronicleMap.q(VanillaChronicleMap.java:431)
                at net.openhft.chronicle.map.VanillaChronicleMap.mapContext(VanillaChronicleMap.java:443)
                at net.openhft.chronicle.map.VanillaChronicleMap.optimizedGet(VanillaChronicleMap.java:579)
                at net.openhft.chronicle.map.VanillaChronicleMap.get(VanillaChronicleMap.java:567)
                at java.base/java.util.concurrent.ConcurrentMap.getOrDefault(ConcurrentMap.java:88)
                at org.opengrok.suggest.popular.impl.chronicle.ChronicleMapAdapter.get(ChronicleMapAdapter.java:62)
                at org.opengrok.suggest.SuggesterProjectData.lambda$getSearchCounts$2(SuggesterProjectData.java:476)
                at org.opengrok.suggest.SuggesterProjectData$WFSTInputIterator.weight(SuggesterProjectData.java:603)
                at org.apache.lucene.search.suggest.SortedInputIterator.sort(SortedInputIterator.java:184)
                at org.apache.lucene.search.suggest.SortedInputIterator.<init>(SortedInputIterator.java:76)
                at org.apache.lucene.search.suggest.SortedInputIterator.<init>(SortedInputIterator.java:62)
                at org.apache.lucene.search.suggest.fst.WFSTCompletionLookup$WFSTInputIterator.<init>(WFSTCompletionLookup.java:273)
                at org.apache.lucene.search.suggest.fst.WFSTCompletionLookup.build(WFSTCompletionLookup.java:115)
                at org.opengrok.suggest.SuggesterProjectData.build(SuggesterProjectData.java:266)
                at org.opengrok.suggest.SuggesterProjectData.build(SuggesterProjectData.java:253)
                at org.opengrok.suggest.SuggesterProjectData.rebuild(SuggesterProjectData.java:238)
                at org.opengrok.suggest.Suggester.lambda$getRebuildRunnable$2(Suggester.java:347)
                at java.base/java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1407)
                at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
                at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
                at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
                at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
                at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
25-Jun-2021 08:51:30.965 INFO [ForkJoinPool.commonPool-worker-11] org.opengrok.suggest.Suggester.shutdownAndAwaitTermination Suggesters for [verity] were successfully rebuil
t (took 12 seconds)
25-Jun-2021 08:51:30.966 INFO [ForkJoinPool.commonPool-worker-5] org.opengrok.suggest.Suggester.rebuild Rebuilding the following suggesters: [verity]
25-Jun-2021 08:51:30.972 SEVERE [ForkJoinPool-275-worker-3] org.opengrok.suggest.Suggester.lambda$getRebuildRunnable$2 Could not rebuild suggester
	net.openhft.chronicle.hash.ChronicleHashClosedException: Access to ChronicleMap{name=defs, file=/opengrok/data/suggester/verity/defs_search_count.db, identityHashCod
e=2041243366} after close()
                at net.openhft.chronicle.map.VanillaChronicleMap.q(VanillaChronicleMap.java:409)
                at net.openhft.chronicle.map.VanillaChronicleMap.mapContext(VanillaChronicleMap.java:443)
                at net.openhft.chronicle.map.VanillaChronicleMap.optimizedGet(VanillaChronicleMap.java:579)
                at net.openhft.chronicle.map.VanillaChronicleMap.get(VanillaChronicleMap.java:567)
                at java.base/java.util.concurrent.ConcurrentMap.getOrDefault(ConcurrentMap.java:88)
                at org.opengrok.suggest.popular.impl.chronicle.ChronicleMapAdapter.get(ChronicleMapAdapter.java:62)
                at org.opengrok.suggest.SuggesterProjectData.lambda$getSearchCounts$2(SuggesterProjectData.java:476)
                at org.opengrok.suggest.SuggesterProjectData$WFSTInputIterator.weight(SuggesterProjectData.java:603)
                at org.apache.lucene.search.suggest.SortedInputIterator.sort(SortedInputIterator.java:184)
                at org.apache.lucene.search.suggest.SortedInputIterator.<init>(SortedInputIterator.java:76)
                at org.apache.lucene.search.suggest.SortedInputIterator.<init>(SortedInputIterator.java:62)
                at org.apache.lucene.search.suggest.fst.WFSTCompletionLookup$WFSTInputIterator.<init>(WFSTCompletionLookup.java:273)
                at org.apache.lucene.search.suggest.fst.WFSTCompletionLookup.build(WFSTCompletionLookup.java:115)
                at org.opengrok.suggest.SuggesterProjectData.build(SuggesterProjectData.java:266)
                at org.opengrok.suggest.SuggesterProjectData.build(SuggesterProjectData.java:253)
                at org.opengrok.suggest.SuggesterProjectData.rebuild(SuggesterProjectData.java:238)
                at org.opengrok.suggest.Suggester.lambda$getRebuildRunnable$2(Suggester.java:347)
                at java.base/java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1407)
                at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
                at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
                at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
                at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
                at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
25-Jun-2021 08:51:30.974 INFO [ForkJoinPool.commonPool-worker-5] org.opengrok.suggest.Suggester.shutdownAndAwaitTermination Suggesters for [verity] were successfully rebuilt
 (took 0 seconds)
25-Jun-2021 08:51:30.987 SEVERE [Thread-20] org.apache.catalina.loader.WebappClassLoaderBase.checkThreadLocalMapForLeaks The web application [ROOT] created a ThreadLocal wit
h key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@1d2e0f1c]) and a value of type [net.openhft.chronicle.hash.impl.ContextHolder] (value [net.openhft.chroni
cle.hash.impl.ContextHolder@5b017732]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable me
mory leak.
25-Jun-2021 08:51:30.988 SEVERE [Thread-20] org.apache.catalina.loader.WebappClassLoaderBase.checkThreadLocalMapForLeaks The web application [ROOT] created a ThreadLocal wit
h key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@8c5f5b]) and a value of type [net.openhft.chronicle.hash.impl.ContextHolder] (value [net.openhft.chronicl
e.hash.impl.ContextHolder@74d214c1]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memo
ry leak.
25-Jun-2021 08:51:30.989 SEVERE [Thread-20] org.apache.catalina.loader.WebappClassLoaderBase.checkThreadLocalMapForLeaks The web application [ROOT] created a ThreadLocal wit
h key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@8c5f5b]) and a value of type [net.openhft.chronicle.hash.impl.ContextHolder] (value [net.openhft.chronicl
e.hash.impl.ContextHolder@44abbbe0]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memo
ry leak.
25-Jun-2021 08:51:30.990 SEVERE [Thread-20] org.apache.catalina.loader.WebappClassLoaderBase.checkThreadLocalMapForLeaks The web application [ROOT] created a ThreadLocal wit
h key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@8c5f5b]) and a value of type [net.openhft.chronicle.hash.impl.ContextHolder] (value [net.openhft.chronicl
e.hash.impl.ContextHolder@33fe07a5]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memo
ry leak.
25-Jun-2021 08:51:30.991 SEVERE [Thread-20] org.apache.catalina.loader.WebappClassLoaderBase.checkThreadLocalMapForLeaks The web application [ROOT] created a ThreadLocal wit
h key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@8c5f5b]) and a value of type [net.openhft.chronicle.hash.impl.ContextHolder] (value [net.openhft.chronicl
e.hash.impl.ContextHolder@5469501d]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memo
ry leak.
25-Jun-2021 08:51:30.992 SEVERE [Thread-20] org.apache.catalina.loader.WebappClassLoaderBase.checkThreadLocalMapForLeaks The web application [ROOT] created a ThreadLocal wit
h key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@1d2e0f1c]) and a value of type [net.openhft.chronicle.hash.impl.ContextHolder] (value [net.openhft.chroni
cle.hash.impl.ContextHolder@90a076b]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable mem
ory leak.
25-Jun-2021 08:51:30.992 SEVERE [Thread-20] org.apache.catalina.loader.WebappClassLoaderBase.checkThreadLocalMapForLeaks The web application [ROOT] created a ThreadLocal wit
h key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@8c5f5b]) and a value of type [net.openhft.chronicle.hash.impl.ContextHolder] (value [net.openhft.chronicl
e.hash.impl.ContextHolder@6fea3b00]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memo
ry leak.
25-Jun-2021 08:51:30.993 SEVERE [Thread-20] org.apache.catalina.loader.WebappClassLoaderBase.checkThreadLocalMapForLeaks The web application [ROOT] created a ThreadLocal wit
h key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@53bdcce3]) and a value of type [net.openhft.chronicle.hash.impl.ContextHolder] (value [net.openhft.chroni
cle.hash.impl.ContextHolder@ad5a8a0]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable mem
ory leak.
25-Jun-2021 08:51:31.023 INFO [Thread-20] org.apache.coyote.AbstractProtocol.stop Stopping ProtocolHandler ["http-nio-8080"]
25-Jun-2021 08:51:31.032 INFO [Thread-20] org.apache.coyote.AbstractProtocol.destroy Destroying ProtocolHandler ["http-nio-8080"]
could not get configuration from web application on http://localhost:8080/
Exception in thread Sync thread:
Traceback (most recent call last):
  File "/usr/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "/scripts/start.py", line 350, in project_syncer
    save_config(logger, uri, config_path)
  File "/scripts/start.py", line 250, in save_config
    config_file.write(config)
TypeError: write() argument must be str, not None

This is latest tomcat shutdown. Let me know if you want me to try something to get you more information.

simranjeetc avatar Jun 25 '21 09:06 simranjeetc

Is there anything relevant before these log entries ?

The 2 lines:

25-Jun-2021 08:51:30.770 INFO [Thread-20] org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler ["http-nio-8080"]
25-Jun-2021 08:51:30.779 INFO [Thread-20] org.apache.catalina.core.StandardService.stopInternal Stopping service [Catalina]

basically say that Tomcat is stopping. There is no info why it is stopping. The first line actually comes from the Tomcat-10/java/org/apache/catalina/core/StandardService.java#setInternal() method as it is stopping the Connectors first: https://github.com/apache/tomcat/blob/197edd2f4bb50bb46ce4c35e6d49765916662ccc/java/org/apache/catalina/core/StandardService.java#L470-L497

The rest of the log is just fallout I think - the suggester failing because the web app is yanked underneath, the start.py API calls failing for the same reason.

Next step would be raising the log level of the Tomcat itself. Something like this (assumes the container is running):

  • docker container cp <container name>:/usr/local/tomcat/conf /tmp/conf
  • change the docker compose volumes to contain: '/tmp/conf:/usr/local/tomcat/conf'
  • change /tmp/conf/logging.properties to contain .level = ALL and change java.util.logging.ConsoleHandler.level to FINEST
  • start the container with this configuration and wait for Tomcat to stop

This will produce lots of output. Hopefully when Tomcat is stopping the next time there will be some relevant log messages.

vladak avatar Jun 25 '21 19:06 vladak

Couple of observations/infos -

  • This time the restart happened after approximately 21 hours which was a quite more than what I have seen before (4-5 hours).
  • There has been lesser activity today on the grok instance in last 24 hours.
  • This time instead of doing a restart with docker-compose restart, I did a docker-compose down and then docker-compose up -d.

Also attaching the logging.properties for your reference.

handlers = java.util.logging.ConsoleHandler

.handlers = java.util.logging.ConsoleHandler

############################################################
# Handler specific properties.
# Describes specific configuration info for Handlers.
############################################################

java.util.logging.ConsoleHandler.level = FINEST
java.util.logging.ConsoleHandler.formatter = org.apache.juli.OneLineFormatter

############################################################
# Facility specific properties.
# Provides extra control for each logger.
############################################################

org.apache.catalina.core.ContainerBase.[Catalina].[localhost].level = ALL
org.apache.catalina.core.ContainerBase.[Catalina].[localhost].handlers = java.util.logging.ConsoleHandler

org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/manager].level = ALL
org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/manager].handlers =  java.util.logging.ConsoleHandler

org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/host-manager].level = ALL
org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/host-manager].handlers = java.util.logging.ConsoleHandler

simranjeetc avatar Jun 26 '21 19:06 simranjeetc

The log level was increased however there is no information about why the StandardService was stopped:

INFO: Refreshing searcher managers to: http://localhost:8080/
Jun 26, 2021 5:54:18 PM org.opengrok.indexer.util.Statistics logIt
INFO: Indexer finished (took 0:01:52)

Sync done
Saving configuration to /opengrok/etc/configuration.xml
26-Jun-2021 17:54:47.880 INFO [Thread-20] org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler ["http-nio-8080"]
26-Jun-2021 17:54:47.889 INFO [Thread-20] org.apache.catalina.core.StandardService.stopInternal Stopping service [Catalina]
26-Jun-2021 17:54:47.989 FINE [Thread-20] org.apache.catalina.core.StandardContext.filterStop Stopping filters
26-Jun-2021 17:54:47.989 FINE [Thread-20] org.apache.catalina.core.StandardContext.filterStop  Stopping filter 'ExpiresHalfHourFilter'
26-Jun-2021 17:54:47.990 FINE [Thread-20] org.apache.catalina.core.StandardContext.filterStop  Stopping filter 'AuthorizationFilter'
26-Jun-2021 17:54:47.990 FINE [Thread-20] org.apache.catalina.core.StandardContext.filterStop  Stopping filter 'CookieFilter'
26-Jun-2021 17:54:47.990 FINE [Thread-20] org.apache.catalina.core.StandardContext.filterStop  Stopping filter 'Tomcat WebSocket (JSR356) Filter'
26-Jun-2021 17:54:47.990 FINE [Thread-20] org.apache.catalina.core.StandardContext.filterStop  Stopping filter 'ExpiresOneDayFilter'
26-Jun-2021 17:54:47.991 FINE [Thread-20] org.apache.catalina.core.StandardContext.filterStop  Stopping filter 'ExpiresOneYearFilter'
26-Jun-2021 17:54:47.991 FINE [Thread-20] org.apache.catalina.core.StandardContext.filterStop  Stopping filter 'StatisticsFilter'
26-Jun-2021 17:54:48.007 INFO [Thread-20] org.opengrok.indexer.configuration.WatchDogService.stop Watchdog stopped

Maybe you can use the big hammer and add:

.level = ALL

to the logging.properties file.

vladak avatar Jun 28 '21 07:06 vladak

I tried something else too. Noticed that whenever there is a restart there is a lucene exception before that.

WARNING: ERROR addFile(): /opengrok/src/jdk8-b120/langtools/test/tools/javac/limits/LongName.java java.lang.IllegalArgumentException: Document contains at least one immense term in field="defs" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 110, 109, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 48, 49, 50, 51]...', original message: bytes can be at most 32766 in length; got 140000 at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:981) at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:524) at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:488)

This is being caused in project jdk8-b120. So I removed that project from the src directory. After that tomcat has been up for last 24 hours. So either this exception is the cause or it was causing the shutdown to happen more frequently. There are instances however where this exception has occurred and the server did not shutdown.

I only need this project to be indexed once and after that it can be ignored. I tried doing this using the mirror.yml in https://github.com/oracle/opengrok/issues/3650#issue-928907847. But it is still getting indexed every time. Can you please tell me what else I need to do to disable future re-indexing of this project?

Meanwhile, I will change the .level to ALL and report back with logs.

simranjeetc avatar Jun 28 '21 14:06 simranjeetc

I think the /opengrok/src/jdk8-b120/langtools/test/tools/javac/limits/LongName.java exception is just a red herring. In the last attached log it happens many more times (in fact I think it happens during each reindex of the project) than the Tomcat stopping. It would be highly suspicious if Tomcat stopped because one of its applications experienced a problem.

Trying https://github.com/openjdk/jdk/tree/jdk8-b120 with opengrok/docker:master to see if I can reproduce this.

vladak avatar Jun 28 '21 14:06 vladak

Jumping into this issue with a 'me too' on the shutdown. I see it on a semi-regular basis over the past 3 releases minimum. Maybe once every two days I'll find that the opengrok docker container isn't running (I need to learn the auto-restart tricks). It always happens near the end of a periodic indexing operation, seemingly around the suggester rebuild time. I just had it happen now while on release 1.7.13. My log levels are not high, so I do not think there is anything new in my information that isn't already listed above.

...
Jun 28, 2021 2:18:20 PM org.opengrok.indexer.index.Indexer refreshSearcherManagers
INFO: Refreshing searcher managers to: http://localhost:8080/
Jun 28, 2021 2:18:20 PM org.opengrok.indexer.util.Statistics logIt
INFO: Indexer finished (took 48.747 seconds)

Sync done
28-Jun-2021 14:21:31.467 INFO [Thread-27] org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler ["http-nio-8080"]
28-Jun-2021 14:21:31.476 INFO [Thread-27] org.apache.catalina.core.StandardService.stopInternal Stopping service [Catalina]
28-Jun-2021 14:21:31.502 INFO [Thread-27] org.apache.catalina.core.StandardWrapper.unload Waiting for [1] instance(s) to be deallocated for Servlet [org.opengrok.web.api.v1.RestApp]
Saving configuration to /opengrok/etc/configuration.xml
Waiting for reindex to be triggered
28-Jun-2021 14:21:31.619 INFO [Thread-27] org.opengrok.indexer.configuration.WatchDogService.stop Watchdog stopped
28-Jun-2021 14:22:37.714 SEVERE [ForkJoinPool-113-worker-19] org.opengrok.suggest.Suggester.lambda$getRebuildRunnable$2 Could not rebuild suggester
        net.openhft.chronicle.hash.ChronicleHashClosedException: Access to ChronicleMap{name=type, file=/opengrok/data/suggester/myproject/type_search_count.db, identityHashCode=1033479179} after close()
                at net.openhft.chronicle.map.VanillaChronicleMap.q(VanillaChronicleMap.java:409)
                at net.openhft.chronicle.map.VanillaChronicleMap.mapContext(VanillaChronicleMap.java:443)
                at net.openhft.chronicle.map.VanillaChronicleMap.optimizedGet(VanillaChronicleMap.java:579)
                at net.openhft.chronicle.map.VanillaChronicleMap.get(VanillaChronicleMap.java:567)
                at java.base/java.util.concurrent.ConcurrentMap.getOrDefault(ConcurrentMap.java:88)
                at org.opengrok.suggest.popular.impl.chronicle.ChronicleMapAdapter.get(ChronicleMapAdapter.java:62)
                at org.opengrok.suggest.SuggesterProjectData.lambda$getSearchCounts$2(SuggesterProjectData.java:476)
                at org.opengrok.suggest.SuggesterProjectData$WFSTInputIterator.weight(SuggesterProjectData.java:603)
                at org.apache.lucene.search.suggest.SortedInputIterator.sort(SortedInputIterator.java:184)
                at org.apache.lucene.search.suggest.SortedInputIterator.<init>(SortedInputIterator.java:76)
                at org.apache.lucene.search.suggest.SortedInputIterator.<init>(SortedInputIterator.java:62)
                at org.apache.lucene.search.suggest.fst.WFSTCompletionLookup$WFSTInputIterator.<init>(WFSTCompletionLookup.java:273)
                at org.apache.lucene.search.suggest.fst.WFSTCompletionLookup.build(WFSTCompletionLookup.java:115)
                at org.opengrok.suggest.SuggesterProjectData.build(SuggesterProjectData.java:266)
                at org.opengrok.suggest.SuggesterProjectData.build(SuggesterProjectData.java:253)
                at org.opengrok.suggest.SuggesterProjectData.rebuild(SuggesterProjectData.java:238)
                at org.opengrok.suggest.Suggester.lambda$getRebuildRunnable$2(Suggester.java:347)
                at java.base/java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1407)
                at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
                at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
                at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
                at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
                at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
28-Jun-2021 14:22:37.716 INFO [ForkJoinPool.commonPool-worker-23] org.opengrok.suggest.Suggester.shutdownAndAwaitTermination Suggesters for [myproject] were successfully rebuilt (took 1 minute 27 seconds)
...
<repeating pattern for each project, about a dozen, but with a rebuild of 0 seconds until things shutdown>
...
28-Jun-2021 14:22:37.754 INFO [ForkJoinPool.commonPool-worker-1] org.opengrok.suggest.Suggester.shutdownAndAwaitTermination Suggesters for [someotherproject] were successfully rebuilt (took 0 seconds)
28-Jun-2021 14:22:37.784 INFO [Thread-27] org.apache.coyote.AbstractProtocol.stop Stopping ProtocolHandler ["http-nio-8080"]
28-Jun-2021 14:22:37.837 INFO [Thread-27] org.apache.coyote.AbstractProtocol.destroy Destroying ProtocolHandler ["http-nio-8080"]

(Replaced my project names with generic names in the log output.)

Seems like the severe errors are an artifact of things shutting down out from under it, but I don't see anything in the logs leading up to it all that indicates what happened to trigger it.

jwehler avatar Jun 28 '21 14:06 jwehler

Seems like I reproduced this on my laptop with the latest master Docker image:

opengrok-test | INFO: Indexer finished (took 0:01:22)
opengrok-test | 
opengrok-test | 29-Jun-2021 07:32:36.009 INFO [Thread-14] org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler ["http-nio-8080"]
opengrok-test | Sync done
opengrok-test | 29-Jun-2021 07:32:36.015 INFO [Thread-14] org.apache.catalina.core.StandardService.stopInternal Stopping service [Catalina]
opengrok-test | 29-Jun-2021 07:32:36.044 INFO [Thread-14] org.opengrok.indexer.configuration.WatchDogService.stop Watchdog stopped
opengrok-test | 29-Jun-2021 07:32:36.065 SEVERE [ForkJoinPool-2305-worker-3] org.opengrok.suggest.Suggester.lambda$getRebuildRunnable$2 Could not rebuild suggester

...

opengrok-test | could not get configuration from web application: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
opengrok-test | Waiting for reindex to be triggered
opengrok-test exited with code 0

It took more than 24 hours (the indexer period was set to 5 minutes). Next thing I will run the container with full Tomcat debug and also killsnoop-bpfcc and execsnoop-bpfcc.

vladak avatar Jun 29 '21 09:06 vladak

The next run ended with the same after some 24 hours (I could not really tell the precise time because of power outage at night that might have put my laptop to sleep for some time. Also I did not properly record the time of the startup).

The container went down with:

opengrok-test | INFO: Indexer finished (took 55.313 seconds)
opengrok-test | 
opengrok-test | Received signal 15
opengrok-test | Terminating Tomcat <subprocess.Popen object at 0x7fc1ea2a3780>
opengrok-test | 30-Jun-2021 07:44:00.366 FINE [Thread-14] org.apache.catalina.util.LifecycleBase.setStateInternal Setting state for [StandardServer[8005]] to [STOPPING_PREP]
opengrok-test | 30-Jun-2021 07:44:00.366 FINE [Thread-14] org.apache.catalina.util.LifecycleBase.setStateInternal Setting state for [StandardServer[8005]] to [STOPPING]
opengrok-test | 30-Jun-2021 07:44:00.366 FINE [Thread-14] org.apache.catalina.mbeans.GlobalResourcesLifecycleListener.destroyMBeans Destroying MBeans for Global JNDI Resources
opengrok-test | Sync done
opengrok-test | 30-Jun-2021 07:44:00.367 FINE [Thread-14] org.apache.catalina.util.LifecycleBase.setStateInternal Setting state for [StandardService[Catalina]] to [STOPPING_PREP]
opengrok-test | 30-Jun-2021 07:44:00.367 INFO [Thread-14] org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler ["http-nio-8080"]
opengrok-test | 30-Jun-2021 07:44:00.368 FINE [Thread-14] org.apache.tomcat.util.net.AbstractEndpoint.unlockAccept About to unlock socket for:/172.21.0.2:8080
opengrok-test | 30-Jun-2021 07:44:00.368 FINE [Thread-14] org.apache.tomcat.util.net.AbstractEndpoint.unlockAccept Socket unlock completed for:/172.21.0.2:8080
opengrok-test | 30-Jun-2021 07:44:00.373 INFO [Thread-14] org.apache.catalina.core.StandardService.stopInternal Stopping service [Catalina]

and the signal trace contained this:

07:43:32  567    systemd-udevd    15   1720360 0
07:43:32  567    systemd-udevd    15   1720366 0
07:43:32  567    systemd-udevd    15   1720370 0
07:43:56  1719906 java             9    127991 0
07:43:56  1568456 python3          15   127500 0
07:43:56  1568456 python3          15   127501 0
07:43:56  1568456 python3          15   127502 0
07:43:56  1568456 python3          15   127503 0
07:43:56  1568456 python3          15   127504 0
07:43:56  1568456 python3          15   127505 0
07:43:56  1568456 python3          15   127507 0
07:45:01  531    systemd-journal  0    1024   0
07:45:01  531    systemd-journal  0    1062   0
07:45:01  531    systemd-journal  0    1063   0
07:45:01  531    systemd-journal  0    1913   0
07:45:01  531    systemd-journal  0    1017   0

which did not match the main start.py process as I captured the process tree when the container was started:

1568348 ?        Sl     0:29 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 2b2f27fddb7d32476d79ea1a60179805b6bd4bd5ad5d42ccc4d6b37c0e01949a -address /run/containerd/containerd.sock
1568370 ?        Ssl    0:13  \_ python3 /scripts/start.py                      
1568463 ?        Sl   100:29      \_ /usr/local/openjdk-11/bin/java -Djava.util.logging.config.file=/usr/local/tomcat/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djdk.tls.ephemeralDHKeySize=2048 -Djava.protocol.handler.pkgs=org.apache.catalina.webresources -Dorg.apache.catalina.security.SecurityListener.UMASK=0027 -Dignore.endorsed.dirs= -classpath /usr/local/tomcat/bin/bootstrap.jar:/usr/local/tomcat/bin/tomcat-juli.jar -Dcatalina.base=/usr/local/tomcat -Dcatalina.home=/usr/local/tomcat -Djava.io.tmpdir=/usr/local/tomcat/temp org.apache.catalina.startup.Bootstrap start
1633488 ?        S      0:00      \_ python3 /scripts/start.py                  
1633489 ?        S      0:00      \_ python3 /scripts/start.py                  
1633490 ?        S      0:00      \_ python3 /scripts/start.py                  
1633491 ?        S      0:00      \_ python3 /scripts/start.py                  
1633492 ?        S      0:00      \_ python3 /scripts/start.py                  
1633493 ?        Sl     0:00      \_ python3 /scripts/start.py                  
1633514 ?        Sl     0:00      |   \_ /usr/bin/python3 /usr/local/bin/opengrok-reindex-project --printoutput --jar /opengrok/lib/opengrok.jar -U http://localhost:8080/ -P jdk8-b120 -- -r dirbased -G -m 256 --leadingWildCards on -c /usr/local/bin/ctags -U http://localhost:8080/ -H jdk8-b120
1633526 ?        Sl     1:59      |       \_ /usr/local/openjdk-11/bin/java -Dorg.opengrok.indexer.history.Subversion=/usr/bin/svn -Dorg.opengrok.indexer.history.Mercurial=/usr/bin/hg -jar /opengrok/lib/opengrok.jar -R /tmp/tmplau_snfw -r dirbased -G -m 256 --leadingWildCards on -c /usr/local/bin/ctags -U http://localhost:8080/ -H jdk8-b120

so it is not possible to tell which process actually sent the SIGTERM. Anyhow, the SIGTERM was somehow delivered to the main process. Could be a problem with the tracing or how the signal was delivered. For instance it could be delivered by other means than the kill syscall as noted on https://stackoverflow.com/a/62434615/11582827

Initially I thought there is some way how e.g. the dockerd could send the signal to the container however I am starting to think there is some weird way how the synchronization done in do_sync() could lead to the SIGTERM being sent to the main process. The way how do_sync() works is that it creates a pool of worker processes to handle the synchronization of multiple projects in parallel. When the work is done, these processes are terminated with SIGTERM as visible in the killsnoop-bpfcc log above. There were 8 worker processes (corresponding to multiprocessing.cpu_count()) that were SIGTERM'd after the sync was done. This is done automatically by the Pool context manager: https://github.com/oracle/opengrok/blob/552e80ca40a4badbc5ce294993d77081c0ca0921/tools/src/main/python/opengrok_tools/sync.py#L103-L118

To verify the hypothesis that this is caused by do_sync() I am going to let a container run with SYNC_PERIOD_MINUTES: '0' for couple of days. Next, I will probably add some better signal tracing to start.py using signal.sigwaitinfo().

vladak avatar Jun 30 '21 10:06 vladak

Wow, the container running opengrok/docker:master with SYNC_PERIOD_MINUTES: '0' (which means there was just a single sync - when the container was started) ended after some 10 hours with:

opengrok-test | Received signal 15
opengrok-test | Terminating Tomcat <subprocess.Popen object at 0x7f61ee975128>
opengrok-test | 30-Jun-2021 19:34:21.143 INFO [Thread-14] org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler ["http-nio-8080"]
opengrok-test | 30-Jun-2021 19:34:21.149 INFO [Thread-14] org.apache.catalina.core.StandardService.stopInternal Stopping service [Catalina]
opengrok-test | 30-Jun-2021 19:34:21.176 INFO [Thread-14] org.opengrok.indexer.configuration.WatchDogService.stop Watchdog stopped
opengrok-test exited with code 0

which means this problem is not related to the syncing.

In the mean time I ran into https://stackoverflow.com/questions/68197507/why-does-sigterm-handling-mess-up-pool-map-handling while playing with do_sync().

vladak avatar Jul 01 '21 08:07 vladak

@vladak Any luck in figuring out the reason behind shutdowns and any fixes/workarounds for the same?

simranjeetc avatar Jul 07 '21 06:07 simranjeetc

Until a fix is in place, I ended up doing:

docker update --restart=always <opengrok-container>

and that has helped. Had one recovery since my last manual restart two days ago:

docker inspect --format '{{json .RestartCount}}' <opengrok-container>
1

Sadly you can't use the --restart=on-failure option because the exit code appears to be 0 when this happens.

jwehler avatar Jul 14 '21 02:07 jwehler

Until a fix is in place, I ended up doing:

docker update --restart=always <opengrok-container>

Agreed. I have been using docker-compose with restart policy as always from last couple of weeks. With that atleast opengrok serves requests without much glitch.

simranjeetc avatar Jul 14 '21 05:07 simranjeetc

I modified start.py to serve basically as a signal waiter so the main() basically looks like this:

    signal.signal(signal.SIGTERM, signal_handler)
    signal.signal(signal.SIGINT, signal_handler)

    logger.info("Waiting for signal on PID {}".format(os.getpid()))
    siginfo = signal.sigwaitinfo({signal.SIGTERM, signal.SIGINT})
    print("got %d from %d by user %d\n" % (siginfo.si_signo,
                                           siginfo.si_pid,
                                           siginfo.si_uid))

There is no sync done. It does not even start Tomcat.

After some time (cannot tell how much exactly since the run it was interrupted by putting the laptop to sleep over the course of couple of days and the signal info print statement above does not give any time) it failed with:

opengrok-dev | 2021-07-07 14:16:13,291     INFO opengrok_tools | Waiting for signal on PID 1
opengrok-dev | got 15 from 0 by user 0

I used the modified killsnoop-bpfcc (https://github.com/iovisor/bcc/issues/3520) to also capture the signals sent with sigqueue() however there is nothing in the log pertaining the processes running in the container.

Now, if I Ctrl-C the container running in foreground, it will also produce the "got 15 from 0 by user 0\n\n" message (just not on the container console; probably has to do with shutting down the logging. In order to get the message I had to use strace on the start.py.) however in that case I can see related events in the BPF log:

11:17:20  253725 runc             15   253625 0
11:17:20  1894   dockerd          2    253574 0
11:17:20  1894   dockerd          2    253588 0

The lack of information about signal sender can mean one of these things:

  • it was sent by the kernel
  • it was sent by using other means (there are couple of system calls in Linux that can also send signals and those are not instrumented in my modified version of killsnoop-bpfcc)
  • the signal was not sent at all and it is just some Pyhon artifact/bug

One of the things I'd like to try next (heh, this is getting a bit like bug update in corporate world which I would like to avoid in this free time project where no one should expect anything from the project owners) is to run tomcat:10-jdk11 image (which the OpenGrok image is based on) with a minimal Python program (like the one above) to see if the behavior can be reproduced there as well. Thinking about bisection done for Docker layers..

vladak avatar Jul 15 '21 09:07 vladak

when you run the image, maybe you can add this option. "--log-driver=journald"

WSUFan avatar Aug 12 '21 16:08 WSUFan

I just found this issue report, after investigating stopped opengrok docker containers for 2-3 days. On my setup the issue occurs quite often, by which I mean every 1-2 hours.

Is there anything I can try to investigate?

mes85 avatar Aug 26 '21 06:08 mes85

When I tried to debug this last time, the tracepoints I used did not generate any events and siginfo lacked data about what generated the signal. I'd like to retry with the idea from https://github.com/iovisor/bcc/issues/3520#issuecomment-894748803

Maybe even publish a barebones Docker image that can reproduce the problem.

vladak avatar Sep 21 '21 09:09 vladak

I still see this issue with latest and master images. What is the last known version/image that does not have this issue?

AdityaSriramM avatar Oct 28 '21 03:10 AdityaSriramM

@AdityaSriramM For me version 1.6.9 (opengrok/docker:1.6.9) does not have this issue. You may want to bisect if you want a version >= 1.7.0 .

lxylxy123456 avatar Nov 01 '21 04:11 lxylxy123456

I can confirm this still exists in the latest image pulled from DockerHub. Image built from current master version also got this.

The docker image I built got this error around 4 times in 8 hours in the beginning and then looks stable for 19 hours. (always restart in docker-compose config)

# docker ps
CONTAINER ID   IMAGE                 COMMAND               CREATED        STATUS        PORTS                    NAMES
f2fd678d93ee   opengrok-dev:latest   "/scripts/start.py"   27 hours ago   Up 19 hours   0.0.0.0:8888->8080/tcp   opengrok-8888
# docker-compose logs | grep -ie "Connection reset by peer"
opengrok-8888    | could not get configuration from web application: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
opengrok-8888    | could not get configuration from web application: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
opengrok-8888    | could not get configuration from web application: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
opengrok-8888    | could not get configuration from web application: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

The error is the same signal 15 after indexer finished.

opengrok-8888    | Jun 08, 2022 8:53:16 AM org.opengrok.indexer.util.Statistics logIt
opengrok-8888    | INFO: Done indexing data of all repositories (took 0:01:38)
opengrok-8888    | Jun 08, 2022 8:53:16 AM org.opengrok.indexer.util.Statistics logIt
opengrok-8888    | INFO: Indexer finished (took 0:01:38)
opengrok-8888    |
opengrok-8888    | Received signal 15
opengrok-8888    | Terminating Tomcat <Popen: returncode: None args: ['/usr/local/tomcat/bin/catalina.sh', 'run']>
opengrok-8888    | Sync done
opengrok-8888    | 08-Jun-2022 08:53:17.081 INFO [Thread-27] org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler ["http-nio-8080"]
opengrok-8888    | 08-Jun-2022 08:53:17.084 INFO [Thread-27] org.apache.catalina.core.StandardService.stopInternal Stopping service [Catalina]
...
opengrok-8888    | 08-Jun-2022 08:53:17.296 INFO [Thread-27] org.apache.coyote.AbstractProtocol.stop Stopping ProtocolHandler ["http-nio-8080"]
opengrok-8888    | 08-Jun-2022 08:53:17.326 INFO [Thread-27] org.apache.coyote.AbstractProtocol.destroy Destroying ProtocolHandler ["http-nio-8080"]
opengrok-8888    | Waiting for reindex to be triggered
opengrok-8888    | could not get configuration from web application: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

leonidwang avatar Jun 11 '22 02:06 leonidwang

[Edit] I stand corrected. After running v1.7.14 longer, I hit the issue again, but looks this time there's no 'signal 15' in the log, issue started from 'org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler' and then connection reset by peer. v1.6.9 is stable for days. I'll try other versions.


My deployment hasn't been upgraded for around a year, but I removed the image immediately after pulling the latest one, so I can't remember the last good version. But it's the version about a year ago. Checking the release history, it should be around 1.7.1x, so I tried different versions at several similar hosts.

I hit the error at 1.7.15, and it is stable at 1.7.14 for 17 hours. 1.6.9 is also stable for over 24 hours.

# docker ps
CONTAINER ID   IMAGE                    COMMAND               CREATED        STATUS        PORTS                    NAMES
9fa1ad9664db   opengrok/docker:1.7.14   "/scripts/start.py"   17 hours ago   Up 17 hours   0.0.0.0:8888->8080/tcp   opengrok-8888


# docker ps
CONTAINER ID   IMAGE                COMMAND               CREATED        STATUS        PORTS                    NAMES
6bdaacef97bd   opengrok-dev:1.6.9   "/scripts/start.py"   27 hours ago   Up 24 hours   0.0.0.0:8888->8080/tcp   opengrok-8888

leonidwang avatar Jun 11 '22 02:06 leonidwang

Is there any update on this issue? We noticed it as well, and didn't have the bandwidth to put much more time into it and had to stop on 1.6.9 because of that, but 1.7 has so many nice features that we would like to use.

Keinkil avatar Nov 11 '22 13:11 Keinkil