openhab-distro
openhab-distro copied to clipboard
OH4.1M5: Very very slow first startup and initial exception (HTTP 404)
It takes a long time to get a first log and it started by a n exception !
19:39:24.466 [ERROR] [.internal.JSONResponseExceptionMapper] - Unexpected exception occurred while processing REST request.
javax.ws.rs.NotFoundException: HTTP 404 Not Found
at org.apache.cxf.jaxrs.utils.SpecExceptions.toNotFoundException(SpecExceptions.java:89) ~[bundleFile:3.6.2]
at org.apache.cxf.jaxrs.utils.ExceptionUtils.toNotFoundException(ExceptionUtils.java:128) ~[bundleFile:3.6.2]
at org.apache.cxf.jaxrs.interceptor.JAXRSInInterceptor.processRequest(JAXRSInInterceptor.java:174) ~[bundleFile:3.6.2]
at org.apache.cxf.jaxrs.interceptor.JAXRSInInterceptor.handleMessage(JAXRSInInterceptor.java:79) ~[bundleFile:3.6.2]
at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) ~[bundleFile:3.6.2]
at org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121) ~[bundleFile:3.6.2]
at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265) ~[bundleFile:3.6.2]
at org.apache.cxf.transport.servlet.ServletController.invokeDestination(ServletController.java:234) ~[bundleFile:3.6.2]
at org.apache.cxf.transport.servlet.ServletController.invoke(ServletController.java:208) ~[bundleFile:3.6.2]
at org.apache.cxf.transport.servlet.ServletController.invoke(ServletController.java:160) ~[bundleFile:3.6.2]
at org.apache.cxf.transport.servlet.CXFNonSpringServlet.invoke(CXFNonSpringServlet.java:225) ~[bundleFile:3.6.2]
at org.apache.cxf.transport.servlet.AbstractHTTPServlet.handleRequest(AbstractHTTPServlet.java:304) ~[bundleFile:3.6.2]
at org.apache.cxf.transport.servlet.AbstractHTTPServlet.doPost(AbstractHTTPServlet.java:217) ~[bundleFile:3.6.2]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:517) ~[bundleFile:4.0.4]
at org.apache.cxf.transport.servlet.AbstractHTTPServlet.service(AbstractHTTPServlet.java:279) ~[bundleFile:3.6.2]
at org.ops4j.pax.web.service.spi.servlet.OsgiInitializedServlet.service(OsgiInitializedServlet.java:102) ~[bundleFile:?]
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799) ~[bundleFile:9.4.52.v20230823]
at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656) ~[bundleFile:9.4.52.v20230823]
at org.ops4j.pax.web.service.spi.servlet.OsgiFilterChain.doFilter(OsgiFilterChain.java:100) ~[bundleFile:?]
at org.ops4j.pax.web.service.jetty.internal.PaxWebServletHandler.doHandle(PaxWebServletHandler.java:320) ~[bundleFile:?]
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) ~[bundleFile:9.4.52.v20230823]
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:600) ~[bundleFile:9.4.52.v20230823]
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[bundleFile:9.4.52.v20230823]
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) ~[bundleFile:9.4.52.v20230823]
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624) ~[bundleFile:9.4.52.v20230823]
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) ~[bundleFile:9.4.52.v20230823]
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440) ~[bundleFile:9.4.52.v20230823]
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) ~[bundleFile:9.4.52.v20230823]
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505) ~[bundleFile:9.4.52.v20230823]
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594) ~[bundleFile:9.4.52.v20230823]
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) ~[bundleFile:9.4.52.v20230823]
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355) ~[bundleFile:9.4.52.v20230823]
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) ~[bundleFile:9.4.52.v20230823]
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234) ~[bundleFile:9.4.52.v20230823]
at org.ops4j.pax.web.service.jetty.internal.PrioritizedHandlerCollection.handle(PrioritizedHandlerCollection.java:96) ~[bundleFile:?]
at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:722) ~[bundleFile:9.4.52.v20230823]
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[bundleFile:9.4.52.v20230823]
at org.eclipse.jetty.server.Server.handle(Server.java:516) ~[bundleFile:9.4.52.v20230823]
at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487) ~[bundleFile:9.4.52.v20230823]
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732) [bundleFile:9.4.52.v20230823]
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479) [bundleFile:9.4.52.v20230823]
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) [bundleFile:9.4.52.v20230823]
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) [bundleFile:9.4.52.v20230823]
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) [bundleFile:9.4.52.v20230823]
at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) [bundleFile:9.4.52.v20230823]
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883) [bundleFile:9.4.52.v20230823]
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034) [bundleFile:9.4.52.v20230823]
at java.lang.Thread.run(Unknown Source) [?:?]
19:39:25.362 [INFO ] [org.openhab.core.Activator ] - Starting openHAB 4.1.0.M5 (build Milestone Build)
It was never as long before. I am curious to know what is this HTTP request returning 404. But finally the server started properly. After the initial exception, the time to start is similar as before.
@openhab/core-maintainers for your information. I hope this is not the new IP scan ?
I do not get the same exception after a restart. So it happens apparently only at the very first start. But I have the strange feeling that the server is blocked at a certain time of the startup and then unblocked after a delay. I could compare the time to start between M3 and M5.
But I have the strange feeling that the server is blocked at a certain time of the startup and then unblocked after a delay. I could compare the time to start between M3 and M5.
Time of startup is finally similar between M3 and M5, around 4 minutes for me.
So the problem I am reporting is only the time to start at the very first startup and the associated exception that is logged.
This issue has been mentioned on openHAB Community. There might be relevant details there:
https://community.openhab.org/t/openhab-4-1-milestone-discussion/149502/146
Even if that exception is thrown because a client requests a resource not available yet, I would say, that exception should be suppressed.
I agree, @florian-h05. It definitely looks as if there was some HTTP request coming in while the system was starting up and couldn't yet serve the resources. I have created https://github.com/openhab/openhab-core/pull/3931 to not log errors in such a case anymore.
@lolodomo Wrt start times, I am not sure how to best investigate this. So it only happens at the very first start? Can you somehow reproduce it on a test system? Was it maybe slow to install the remote add-ons because our artifactory might have had network issues?
Was it maybe slow to install the remote add-ons because our artifactory might have had network issues?
I am using the kar file and remote is disabled. Hope that kar file is still used.
It definitely looks as if there was some HTTP request coming in while the system was starting up
Strange but maybe you're right. I will see if the problem happens again with RC1.
I have upgrade to the snapshot yesterday evening and also experienced an extremely slow start-up. It took about 4 minutes until any log output was generated, and then I received plenty of the error messages above. Normally, a few of those were „normal“ because I have a few devices in the network regularly polling the REST API. I guess the high number of those messages has something to do with the extremely slow startup and is only a symptom, not the root cause.
I guess the high number of those messages has something to do with the extremely slow startup and is only a symptom, not the root cause.
Yes, I would guess so as well. With today's snapshot, you shouldn't see any such log messages anymore as https://github.com/openhab/openhab-core/pull/3931 is now contained.
Hmm, with the latest snapshot I am getting this error:
2023-12-20 11:16:32.751 [ERROR] [internal.JSONResponseExceptionMapper] - Unexpected exception occurred while processing REST request.
javax.ws.rs.ClientErrorException: HTTP 404 Not Found
at org.apache.cxf.jaxrs.utils.SpecExceptions.toHttpException(SpecExceptions.java:118) ~[bundleFile:3.6.2]
at org.apache.cxf.jaxrs.utils.ExceptionUtils.toHttpException(ExceptionUtils.java:168) ~[bundleFile:3.6.2]
at org.apache.cxf.jaxrs.utils.JAXRSUtils.findTargetMethod(JAXRSUtils.java:673) ~[bundleFile:3.6.2]
at org.apache.cxf.jaxrs.interceptor.JAXRSInInterceptor.processRequest(JAXRSInInterceptor.java:182) ~[bundleFile:3.6.2]
at org.apache.cxf.jaxrs.interceptor.JAXRSInInterceptor.handleMessage(JAXRSInInterceptor.java:79) ~[bundleFile:3.6.2]
at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) ~[bundleFile:3.6.2]
Looks as if we have to catch yet another exception type...
I wonder why we use WebClientException
in ItemResource
l. 488 instead of returning a 404 response (like in other places).
I don't see any good reason for it, @J-N-K. I have changed it with https://github.com/openhab/openhab-core/pull/3940.
I have upgrade to the snapshot yesterday evening and also experienced an extremely slow start-up. It took about 4 minutes until any log output was generated,
It was the same for me with M5, no log before a very long time. If there is a new snapshot this evening, I will try again and compute how much time to get the very first log. And I will compare first start and next starts.
I have tried out several recent snapshots, including the latest one 3792, and with none I have experienced a slow (first) startup again. I have always cleaned the cache before starting a the newly installed snapshot.
@lolodomo & @florian-h05 We have a new snapshot distro (3794) available now.
When upgrading to that snapshot, startup took the same time as usual, great!
Feedback in few minutes with snapshot 3794...
First startup: 3min23 to get the log entry "Starting openHAB 4.1.0 (build Build #3794)". 5m18 to have another log entry (LSP service). Around 8min20 to have everything loaded and started !!!
If I stop and restart the server: 45s to get the same first log entry and around 3min55 to start everything. This looks bigger than before I believe. There is 40s while nothing happens in logs, just after starting the openHAB Cloud service or maybe this is the load of my rule file ? Note something interesting, some requests form the Netatmo binding were then interrupted.
18:47:38.912 [INFO ] [b.core.model.lsp.internal.ModelServer] - Started Language Server Protocol (LSP) service on port 5007
18:47:40.329 [INFO ] [del.core.internal.ModelRepositoryImpl] - Loading model 'maison.rules'
18:47:50.587 [INFO ] [.io.openhabcloud.internal.CloudClient] - Connected to the openHAB Cloud service (xxx)
18:48:29.556 [WARN ] [tty.util.ssl.SslContextFactory.config] - Trusting all certificates configured for Client@f1a324[provider=null,keyStore=null,trustStore=null]
18:48:29.561 [WARN ] [tty.util.ssl.SslContextFactory.config] - No Client EndPointIdentificationAlgorithm configured for Client@f1a324[provider=null,keyStore=null,trustStore=null]
18:48:50.446 [WARN ] [nal.handler.capability.HomeCapability] - Error getting Home informations: Request interrupted
18:48:50.438 [WARN ] [.handler.capability.WeatherCapability] - Error retrieving weather data 'xxx' : Request interrupted
18:48:50.539 [WARN ] [handler.capability.SecurityCapability] - Error retrieving last events for home 'xxx' : Request interrupted
I will compare normal start with 4.1 M3 later this evening.
4.1 M3
- 29s to get the banner
- around 3min58s to load everything
- around 45s to stop the server
Comparing a (re)start of 4.1 M3 and snapshot 3794 (meaning not the initial startup) is relatively similar in duration.
Stop of the server is 45s with 4.1 M3 while 56s with snapshot 3794.
Maybe the real problem is the 8 minutes for the very first startup of snapshot 3794. Remember that I use a kar file for addons so there is even nothing to download.
I will now compare every 4.1 milestones on my PC with a fully empty setup.
Could it be related to the installation of the discovery addons.xml
? Since it is a boot feature, it is installed early. The file itself is contained in the .kar and requires the .kar to be processed before it can be installed.
On a Windows PC for the very first startup with a fully empty config and no kar file until "Rule engine started." is displayed:
- 4.1 M1: around 36s
- 4.1 M5: around 31s
- snapshot 3794: around 35s So very similar.
Yes, but that is without .kar. Then the file get's installed from the remote location where the source feature is immediately available.
Since it is a boot feature, it is installed early. The file itself is contained in the .kar
What? We require an external feature for the whole core to start up? That's an issue, I would say - so far, the core is meant to be fully contained in the distro and not dependent on any externally sourced feature. Is there a reason that this file has to be in a feature or could it simply be packaged into the distro?
Yes, but that is without .kar. Then the file get's installed from the remote location where the source feature is immediately available.
If I try again on PC with snapshot 3794 but this time with the kar file in addons, still with an empty config, it takes around the same time (38s).
I can try again on my RPI the very first startup without the kar file.
Very first startup on my RPI without the kar file and with setting remote set to true: yes, it is faster, but still around 7 minutes.
For the second start, I get approximatively the same start time if I have initially the kar file or not. This is logic as I guess the kar file is not used at the second start.
My SD card is now old, I don't know if it could explain such durations.