Slow UI/Graphql endpoints after upgrade to v1.1.0
Hello, We are running datahub on EKS on AWS, we are using opensearch and AWS MSK.
After upgrading to v1.1.0 from v0.15.0 ( I went to v1.0.0 first ), I noticed that the UI is taking ages to load. After taking a look at the network calls, I can see that mainly the graphql calls are taking a while.
I suspected that this could be linked to opensearch. First thing I noticed that the number of searchable documents in opensearch tripped after running systemUpdate pod. Is this normal? Can this increase cause this slowness?
Thanks
hey @moenesbs! thanks for raising this concern. my first question for you is are you running the new UI or are you still on the old UI after your upgrade? also, do you know anything more about the number of documents in opensearch that increased? any particular entity index for example?
@chriscollins3456 I can confirm it. After upgrade to 1.1.0 and switching to the new UI rendering of the home page take almost infinite time. Upgrade to 1.2.0 has not resolve the problem. I see a lot of 502 from graphql. We deploy DataHub on GKE with Postgres CloudSQL as a backend DB. I've check metrics of resources utilization for the frontend, gms, cloud sql - everything is about 25 - 30% (CPU and RAM).
I see a lot of 502 from graphql.
@Linux-oiD interesting - so are those 502 errors timeout errors for you or some other sort of server errors? if they're timeouts that would seem to match the idea of this github issue here. if they're other server related issues then that might just be a problem with your upgrade and would require checking out the logs of GMS. let me know once you know!
After upgrade to 1.1.0 and switching to the new UI rendering of the home page
if you turn off the new UI do you still see this issue? or is this always an issue after your upgrade?
@chriscollins3456 yes. It's a timeout. There are no additional errors in GMS log. Switching back to old UI helps.
@Linux-oiD Would you mind attaching the gms logs? 502 should leave a trace somewhere. Thanks!
I also face the same issue. We've deployed datahub on our cluster and after upgrading to the latest version, gms crashes after a while when browsing the UI:
2025-08-29 13:18:03,528 [qtp393476856-184] ERROR i.d.o.c.GlobalControllerExceptionHandler:148 - Unhandled exception occurred for request: /api/graphql
org.springframework.web.context.request.async.AsyncRequestTimeoutException: null
at org.springframework.web.context.request.async.TimeoutDeferredResultProcessingInterceptor.handleTimeout(TimeoutDeferredResultProcessingInterceptor.java:42)
at org.springframework.web.context.request.async.DeferredResultInterceptorChain.triggerAfterTimeout(DeferredResultInterceptorChain.java:81)
at org.springframework.web.context.request.async.WebAsyncManager.lambda$startDeferredResultProcessing$5(WebAsyncManager.java:434)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at org.springframework.web.context.request.async.StandardServletAsyncWebRequest.onTimeout(StandardServletAsyncWebRequest.java:186)
at org.eclipse.jetty.ee10.servlet.ServletChannelState$2.run(ServletChannelState.java:761)
at org.eclipse.jetty.server.handler.ContextHandler$ScopedContext.run(ContextHandler.java:1518)
at org.eclipse.jetty.server.handler.ContextHandler$ScopedContext.run(ContextHandler.java:1511)
at org.eclipse.jetty.ee10.servlet.ServletChannelState.runInContext(ServletChannelState.java:1308)
at org.eclipse.jetty.ee10.servlet.ServletChannelState.onTimeout(ServletChannelState.java:780)
at org.eclipse.jetty.ee10.servlet.ServletChannel.handle(ServletChannel.java:448)
at org.eclipse.jetty.server.handler.ContextHandler$ScopedContext.run(ContextHandler.java:1524)
at org.eclipse.jetty.server.handler.ContextHandler$ScopedContext.lambda$execute$0(ContextHandler.java:1541)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:981)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1211)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1166)
at java.base/java.lang.Thread.run(Thread.java:840)
2025-08-29 13:18:03,527 [qtp393476856-193] ERROR i.d.o.c.GlobalControllerExceptionHandler:148 - Unhandled exception occurred for request: /api/graphql
org.springframework.web.context.request.async.AsyncRequestTimeoutException: null
at org.springframework.web.context.request.async.TimeoutDeferredResultProcessingInterceptor.handleTimeout(TimeoutDeferredResultProcessingInterceptor.java:42)
at org.springframework.web.context.request.async.DeferredResultInterceptorChain.triggerAfterTimeout(DeferredResultInterceptorChain.java:81)
at org.springframework.web.context.request.async.WebAsyncManager.lambda$startDeferredResultProcessing$5(WebAsyncManager.java:434)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at org.springframework.web.context.request.async.StandardServletAsyncWebRequest.onTimeout(StandardServletAsyncWebRequest.java:186)
at org.eclipse.jetty.ee10.servlet.ServletChannelState$2.run(ServletChannelState.java:761)
at org.eclipse.jetty.server.handler.ContextHandler$ScopedContext.run(ContextHandler.java:1518)
at org.eclipse.jetty.server.handler.ContextHandler$ScopedContext.run(ContextHandler.java:1511)
at org.eclipse.jetty.ee10.servlet.ServletChannelState.runInContext(ServletChannelState.java:1308)
at org.eclipse.jetty.ee10.servlet.ServletChannelState.onTimeout(ServletChannelState.java:780)
at org.eclipse.jetty.ee10.servlet.ServletChannel.handle(ServletChannel.java:448)
at org.eclipse.jetty.server.handler.ContextHandler$ScopedContext.run(ContextHandler.java:1524)
at org.eclipse.jetty.server.handler.ContextHandler$ScopedContext.lambda$execute$0(ContextHandler.java:1541)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:981)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1211)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1166)
at java.base/java.lang.Thread.run(Thread.java:840)
2025-08-29 13:18:03,536 [qtp393476856-196] ERROR i.d.o.c.GlobalControllerExceptionHandler:148 - Unhandled exception occurred for request: /api/graphql
org.springframework.web.context.request.async.AsyncRequestTimeoutException: null
at org.springframework.web.context.request.async.TimeoutDeferredResultProcessingInterceptor.handleTimeout(TimeoutDeferredResultProcessingInterceptor.java:42)
at org.springframework.web.context.request.async.DeferredResultInterceptorChain.triggerAfterTimeout(DeferredResultInterceptorChain.java:81)
at org.springframework.web.context.request.async.WebAsyncManager.lambda$startDeferredResultProcessing$5(WebAsyncManager.java:434)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at org.springframework.web.context.request.async.StandardServletAsyncWebRequest.onTimeout(StandardServletAsyncWebRequest.java:186)
at org.eclipse.jetty.ee10.servlet.ServletChannelState$2.run(ServletChannelState.java:761)
at org.eclipse.jetty.server.handler.ContextHandler$ScopedContext.run(ContextHandler.java:1518)
at org.eclipse.jetty.server.handler.ContextHandler$ScopedContext.run(ContextHandler.java:1511)
at org.eclipse.jetty.ee10.servlet.ServletChannelState.runInContext(ServletChannelState.java:1308)
at org.eclipse.jetty.ee10.servlet.ServletChannelState.onTimeout(ServletChannelState.java:780)
at org.eclipse.jetty.ee10.servlet.ServletChannel.handle(ServletChannel.java:448)
at org.eclipse.jetty.server.handler.ContextHandler$ScopedContext.run(ContextHandler.java:1524)
at org.eclipse.jetty.server.handler.ContextHandler$ScopedContext.lambda$execute$0(ContextHandler.java:1541)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:981)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1211)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1166)
at java.base/java.lang.Thread.run(Thread.java:840)
2025-08-29 13:18:03,531 [qtp393476856-188] ERROR i.d.o.c.GlobalControllerExceptionHandler:148 - Unhandled exception occurred for request: /api/graphql
org.springframework.web.context.request.async.AsyncRequestTimeoutException: null
at org.springframework.web.context.request.async.TimeoutDeferredResultProcessingInterceptor.handleTimeout(TimeoutDeferredResultProcessingInterceptor.java:42)
at org.springframework.web.context.request.async.DeferredResultInterceptorChain.triggerAfterTimeout(DeferredResultInterceptorChain.java:81)
at org.springframework.web.context.request.async.WebAsyncManager.lambda$startDeferredResultProcessing$5(WebAsyncManager.java:434)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at org.springframework.web.context.request.async.StandardServletAsyncWebRequest.onTimeout(StandardServletAsyncWebRequest.java:186)
at org.eclipse.jetty.ee10.servlet.ServletChannelState$2.run(ServletChannelState.java:761)
at org.eclipse.jetty.server.handler.ContextHandler$ScopedContext.run(ContextHandler.java:1518)
at org.eclipse.jetty.server.handler.ContextHandler$ScopedContext.run(ContextHandler.java:1511)
at org.eclipse.jetty.ee10.servlet.ServletChannelState.runInContext(ServletChannelState.java:1308)
at org.eclipse.jetty.ee10.servlet.ServletChannelState.onTimeout(ServletChannelState.java:780)
at org.eclipse.jetty.ee10.servlet.ServletChannel.handle(ServletChannel.java:448)
at org.eclipse.jetty.server.handler.ContextHandler$ScopedContext.run(ContextHandler.java:1524)
at org.eclipse.jetty.server.handler.ContextHandler$ScopedContext.lambda$execute$0(ContextHandler.java:1541)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:981)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1211)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1166)
at java.base/java.lang.Thread.run(Thread.java:840)
2025-08-29 13:18:03,538 [qtp393476856-178] ERROR i.d.o.c.GlobalControllerExceptionHandler:148 - Unhandled exception occurred for request: /api/graphql
org.springframework.web.context.request.async.AsyncRequestTimeoutException: null
at org.springframework.web.context.request.async.TimeoutDeferredResultProcessingInterceptor.handleTimeout(TimeoutDeferredResultProcessingInterceptor.java:42)
at org.springframework.web.context.request.async.DeferredResultInterceptorChain.triggerAfterTimeout(DeferredResultInterceptorChain.java:81)
at org.springframework.web.context.request.async.WebAsyncManager.lambda$startDeferredResultProcessing$5(WebAsyncManager.java:434)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at org.springframework.web.context.request.async.StandardServletAsyncWebRequest.onTimeout(StandardServletAsyncWebRequest.java:186)
at org.eclipse.jetty.ee10.servlet.ServletChannelState$2.run(ServletChannelState.java:761)
at org.eclipse.jetty.server.handler.ContextHandler$ScopedContext.run(ContextHandler.java:1518)
at org.eclipse.jetty.server.handler.ContextHandler$ScopedContext.run(ContextHandler.java:1511)
at org.eclipse.jetty.ee10.servlet.ServletChannelState.runInContext(ServletChannelState.java:1308)
at org.eclipse.jetty.ee10.servlet.ServletChannelState.onTimeout(ServletChannelState.java:780)
at org.eclipse.jetty.ee10.servlet.ServletChannel.handle(ServletChannel.java:448)
at org.eclipse.jetty.server.handler.ContextHandler$ScopedContext.run(ContextHandler.java:1524)
at org.eclipse.jetty.server.handler.ContextHandler$ScopedContext.lambda$execute$0(ContextHandler.java:1541)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:981)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1211)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1166)
at java.base/java.lang.Thread.run(Thread.java:840)
...
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "BatchSpanProcessor_WorkerThread-1"
2025/08/29 13:21:11 Received signal: terminated
2025/08/29 13:21:21 Killing command due to timeout.
No other relevant logs show (I've checked the elasticsearch pod as well).
It works for a few minutes, then it freezes (i guess waiting on some async calls to complete) and eventually restarts. The CPU usage and RAM go up as well (~4300m, 1.6G). I've raised the limits of the pods, but didn't work.
When I switch to the old UI, the problem does not appear.
Hey team! Any progress with this issue?
1.3.0 - still same performance issue.
I've been encountering the same issue over the past few months. The slow performance has pretty much made datahub unusable.
We've faced performance issue in the Glossary page. DataHub's glossaryV2 react components use the getRootGlossaryTerms/getRootGlossaryNodes queries with rootGlossaryNodeWithFourLayers fragment. On our setup this query was taking pretty much time with timing out sometimes
We just rewrote the graphql query to use only one layer and dropped the description from resulting columns (because this descriptions wasn't gaining the profit for end users). Nested layers also wasn't used for rendering, because even without them expanding the node causes new request for the content of this node
Another issue arose relates to the Domains page. There is no paging like in the glossary page. And we have plenty of domains at the one layer of the tree. This request also timed out time to time. This query use parentDomainsFields fragment for all domains, but many of the result fields is not used for rendering the page. So we rewrote the query using newly created parentDomainsFieldsForList fragment with following definition
fragment parentDomainsFieldsForList on ParentDomainsResult {
count
domains {
urn
type
... on Domain {
displayProperties {
...displayPropertiesFields
}
properties {
name
}
}
}
}