openhab-distro icon indicating copy to clipboard operation
openhab-distro copied to clipboard

OH4.1M5: Very very slow first startup and initial exception (HTTP 404)

Open lolodomo opened this issue 1 year ago • 37 comments

It takes a long time to get a first log and it started by a n exception !

19:39:24.466 [ERROR] [.internal.JSONResponseExceptionMapper] - Unexpected exception occurred while processing REST request.
javax.ws.rs.NotFoundException: HTTP 404 Not Found
        at org.apache.cxf.jaxrs.utils.SpecExceptions.toNotFoundException(SpecExceptions.java:89) ~[bundleFile:3.6.2]
        at org.apache.cxf.jaxrs.utils.ExceptionUtils.toNotFoundException(ExceptionUtils.java:128) ~[bundleFile:3.6.2]
        at org.apache.cxf.jaxrs.interceptor.JAXRSInInterceptor.processRequest(JAXRSInInterceptor.java:174) ~[bundleFile:3.6.2]
        at org.apache.cxf.jaxrs.interceptor.JAXRSInInterceptor.handleMessage(JAXRSInInterceptor.java:79) ~[bundleFile:3.6.2]
        at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) ~[bundleFile:3.6.2]
        at org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121) ~[bundleFile:3.6.2]
        at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265) ~[bundleFile:3.6.2]
        at org.apache.cxf.transport.servlet.ServletController.invokeDestination(ServletController.java:234) ~[bundleFile:3.6.2]
        at org.apache.cxf.transport.servlet.ServletController.invoke(ServletController.java:208) ~[bundleFile:3.6.2]
        at org.apache.cxf.transport.servlet.ServletController.invoke(ServletController.java:160) ~[bundleFile:3.6.2]
        at org.apache.cxf.transport.servlet.CXFNonSpringServlet.invoke(CXFNonSpringServlet.java:225) ~[bundleFile:3.6.2]
        at org.apache.cxf.transport.servlet.AbstractHTTPServlet.handleRequest(AbstractHTTPServlet.java:304) ~[bundleFile:3.6.2]
        at org.apache.cxf.transport.servlet.AbstractHTTPServlet.doPost(AbstractHTTPServlet.java:217) ~[bundleFile:3.6.2]
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:517) ~[bundleFile:4.0.4]
        at org.apache.cxf.transport.servlet.AbstractHTTPServlet.service(AbstractHTTPServlet.java:279) ~[bundleFile:3.6.2]
        at org.ops4j.pax.web.service.spi.servlet.OsgiInitializedServlet.service(OsgiInitializedServlet.java:102) ~[bundleFile:?]
        at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656) ~[bundleFile:9.4.52.v20230823]
        at org.ops4j.pax.web.service.spi.servlet.OsgiFilterChain.doFilter(OsgiFilterChain.java:100) ~[bundleFile:?]
        at org.ops4j.pax.web.service.jetty.internal.PaxWebServletHandler.doHandle(PaxWebServletHandler.java:320) ~[bundleFile:?]
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:600) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234) ~[bundleFile:9.4.52.v20230823]
        at org.ops4j.pax.web.service.jetty.internal.PrioritizedHandlerCollection.handle(PrioritizedHandlerCollection.java:96) ~[bundleFile:?]
        at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:722) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.Server.handle(Server.java:516) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732) [bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479) [bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) [bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) [bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) [bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) [bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883) [bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034) [bundleFile:9.4.52.v20230823]
        at java.lang.Thread.run(Unknown Source) [?:?]
19:39:25.362 [INFO ] [org.openhab.core.Activator           ] - Starting openHAB 4.1.0.M5 (build Milestone Build)

It was never as long before. I am curious to know what is this HTTP request returning 404. But finally the server started properly. After the initial exception, the time to start is similar as before.

lolodomo avatar Dec 17 '23 18:12 lolodomo

@openhab/core-maintainers for your information. I hope this is not the new IP scan ?

lolodomo avatar Dec 17 '23 18:12 lolodomo

I do not get the same exception after a restart. So it happens apparently only at the very first start. But I have the strange feeling that the server is blocked at a certain time of the startup and then unblocked after a delay. I could compare the time to start between M3 and M5.

lolodomo avatar Dec 17 '23 19:12 lolodomo

But I have the strange feeling that the server is blocked at a certain time of the startup and then unblocked after a delay. I could compare the time to start between M3 and M5.

Time of startup is finally similar between M3 and M5, around 4 minutes for me.

So the problem I am reporting is only the time to start at the very first startup and the associated exception that is logged.

lolodomo avatar Dec 17 '23 19:12 lolodomo

This issue has been mentioned on openHAB Community. There might be relevant details there:

https://community.openhab.org/t/openhab-4-1-milestone-discussion/149502/146

openhab-bot avatar Dec 17 '23 19:12 openhab-bot

Even if that exception is thrown because a client requests a resource not available yet, I would say, that exception should be suppressed.

florian-h05 avatar Dec 18 '23 08:12 florian-h05

I agree, @florian-h05. It definitely looks as if there was some HTTP request coming in while the system was starting up and couldn't yet serve the resources. I have created https://github.com/openhab/openhab-core/pull/3931 to not log errors in such a case anymore.

@lolodomo Wrt start times, I am not sure how to best investigate this. So it only happens at the very first start? Can you somehow reproduce it on a test system? Was it maybe slow to install the remote add-ons because our artifactory might have had network issues?

kaikreuzer avatar Dec 18 '23 22:12 kaikreuzer

Was it maybe slow to install the remote add-ons because our artifactory might have had network issues?

I am using the kar file and remote is disabled. Hope that kar file is still used.

lolodomo avatar Dec 18 '23 22:12 lolodomo

It definitely looks as if there was some HTTP request coming in while the system was starting up

Strange but maybe you're right. I will see if the problem happens again with RC1.

lolodomo avatar Dec 18 '23 22:12 lolodomo

I have upgrade to the snapshot yesterday evening and also experienced an extremely slow start-up. It took about 4 minutes until any log output was generated, and then I received plenty of the error messages above. Normally, a few of those were „normal“ because I have a few devices in the network regularly polling the REST API. I guess the high number of those messages has something to do with the extremely slow startup and is only a symptom, not the root cause.

florian-h05 avatar Dec 20 '23 08:12 florian-h05

I guess the high number of those messages has something to do with the extremely slow startup and is only a symptom, not the root cause.

Yes, I would guess so as well. With today's snapshot, you shouldn't see any such log messages anymore as https://github.com/openhab/openhab-core/pull/3931 is now contained.

kaikreuzer avatar Dec 20 '23 08:12 kaikreuzer

Hmm, with the latest snapshot I am getting this error:

2023-12-20 11:16:32.751 [ERROR] [internal.JSONResponseExceptionMapper] - Unexpected exception occurred while processing REST request.
javax.ws.rs.ClientErrorException: HTTP 404 Not Found
	at org.apache.cxf.jaxrs.utils.SpecExceptions.toHttpException(SpecExceptions.java:118) ~[bundleFile:3.6.2]
	at org.apache.cxf.jaxrs.utils.ExceptionUtils.toHttpException(ExceptionUtils.java:168) ~[bundleFile:3.6.2]
	at org.apache.cxf.jaxrs.utils.JAXRSUtils.findTargetMethod(JAXRSUtils.java:673) ~[bundleFile:3.6.2]
	at org.apache.cxf.jaxrs.interceptor.JAXRSInInterceptor.processRequest(JAXRSInInterceptor.java:182) ~[bundleFile:3.6.2]
	at org.apache.cxf.jaxrs.interceptor.JAXRSInInterceptor.handleMessage(JAXRSInInterceptor.java:79) ~[bundleFile:3.6.2]
	at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) ~[bundleFile:3.6.2]

florian-h05 avatar Dec 20 '23 10:12 florian-h05

Looks as if we have to catch yet another exception type...

kaikreuzer avatar Dec 20 '23 10:12 kaikreuzer

I wonder why we use WebClientException in ItemResource l. 488 instead of returning a 404 response (like in other places).

J-N-K avatar Dec 20 '23 10:12 J-N-K

I don't see any good reason for it, @J-N-K. I have changed it with https://github.com/openhab/openhab-core/pull/3940.

kaikreuzer avatar Dec 20 '23 10:12 kaikreuzer

I have upgrade to the snapshot yesterday evening and also experienced an extremely slow start-up. It took about 4 minutes until any log output was generated,

It was the same for me with M5, no log before a very long time. If there is a new snapshot this evening, I will try again and compute how much time to get the very first log. And I will compare first start and next starts.

lolodomo avatar Dec 20 '23 12:12 lolodomo

I have tried out several recent snapshots, including the latest one 3792, and with none I have experienced a slow (first) startup again. I have always cleaned the cache before starting a the newly installed snapshot.

florian-h05 avatar Dec 20 '23 13:12 florian-h05

@lolodomo & @florian-h05 We have a new snapshot distro (3794) available now.

kaikreuzer avatar Dec 20 '23 13:12 kaikreuzer

When upgrading to that snapshot, startup took the same time as usual, great!

florian-h05 avatar Dec 20 '23 17:12 florian-h05

Feedback in few minutes with snapshot 3794...

lolodomo avatar Dec 20 '23 17:12 lolodomo

First startup: 3min23 to get the log entry "Starting openHAB 4.1.0 (build Build #3794)". 5m18 to have another log entry (LSP service). Around 8min20 to have everything loaded and started !!!

If I stop and restart the server: 45s to get the same first log entry and around 3min55 to start everything. This looks bigger than before I believe. There is 40s while nothing happens in logs, just after starting the openHAB Cloud service or maybe this is the load of my rule file ? Note something interesting, some requests form the Netatmo binding were then interrupted.

18:47:38.912 [INFO ] [b.core.model.lsp.internal.ModelServer] - Started Language Server Protocol (LSP) service on port 5007
18:47:40.329 [INFO ] [del.core.internal.ModelRepositoryImpl] - Loading model 'maison.rules'
18:47:50.587 [INFO ] [.io.openhabcloud.internal.CloudClient] - Connected to the openHAB Cloud service (xxx)
18:48:29.556 [WARN ] [tty.util.ssl.SslContextFactory.config] - Trusting all certificates configured for Client@f1a324[provider=null,keyStore=null,trustStore=null]
18:48:29.561 [WARN ] [tty.util.ssl.SslContextFactory.config] - No Client EndPointIdentificationAlgorithm configured for Client@f1a324[provider=null,keyStore=null,trustStore=null]
18:48:50.446 [WARN ] [nal.handler.capability.HomeCapability] - Error getting Home informations: Request interrupted
18:48:50.438 [WARN ] [.handler.capability.WeatherCapability] - Error retrieving weather data 'xxx' : Request interrupted
18:48:50.539 [WARN ] [handler.capability.SecurityCapability] - Error retrieving last events for home 'xxx' : Request interrupted

I will compare normal start with 4.1 M3 later this evening.

lolodomo avatar Dec 20 '23 17:12 lolodomo

4.1 M3

  • 29s to get the banner
  • around 3min58s to load everything
  • around 45s to stop the server

Comparing a (re)start of 4.1 M3 and snapshot 3794 (meaning not the initial startup) is relatively similar in duration.

Stop of the server is 45s with 4.1 M3 while 56s with snapshot 3794.

Maybe the real problem is the 8 minutes for the very first startup of snapshot 3794. Remember that I use a kar file for addons so there is even nothing to download.

I will now compare every 4.1 milestones on my PC with a fully empty setup.

lolodomo avatar Dec 20 '23 18:12 lolodomo

Could it be related to the installation of the discovery addons.xml? Since it is a boot feature, it is installed early. The file itself is contained in the .kar and requires the .kar to be processed before it can be installed.

J-N-K avatar Dec 20 '23 19:12 J-N-K

On a Windows PC for the very first startup with a fully empty config and no kar file until "Rule engine started." is displayed:

  • 4.1 M1: around 36s
  • 4.1 M5: around 31s
  • snapshot 3794: around 35s So very similar.

lolodomo avatar Dec 20 '23 19:12 lolodomo

Yes, but that is without .kar. Then the file get's installed from the remote location where the source feature is immediately available.

J-N-K avatar Dec 20 '23 19:12 J-N-K

Since it is a boot feature, it is installed early. The file itself is contained in the .kar

What? We require an external feature for the whole core to start up? That's an issue, I would say - so far, the core is meant to be fully contained in the distro and not dependent on any externally sourced feature. Is there a reason that this file has to be in a feature or could it simply be packaged into the distro?

kaikreuzer avatar Dec 20 '23 19:12 kaikreuzer

Yes, but that is without .kar. Then the file get's installed from the remote location where the source feature is immediately available.

If I try again on PC with snapshot 3794 but this time with the kar file in addons, still with an empty config, it takes around the same time (38s).

lolodomo avatar Dec 20 '23 19:12 lolodomo

I can try again on my RPI the very first startup without the kar file.

lolodomo avatar Dec 20 '23 19:12 lolodomo

Very first startup on my RPI without the kar file and with setting remote set to true: yes, it is faster, but still around 7 minutes.

lolodomo avatar Dec 20 '23 19:12 lolodomo

For the second start, I get approximatively the same start time if I have initially the kar file or not. This is logic as I guess the kar file is not used at the second start.

lolodomo avatar Dec 20 '23 20:12 lolodomo

My SD card is now old, I don't know if it could explain such durations.

lolodomo avatar Dec 20 '23 20:12 lolodomo