OpenMetadata
OpenMetadata copied to clipboard
Feed DAO query performance issue with high feed volume
Affected module Backend Error (specifically FeedDAO)causing the UI unable to show feeds
Describe the bug
When there are a large number of feeds, the listThreadsByEntityLink SQL query times out and throws a PSQLException. The error message indicates that the statement is being canceled due to a statement timeout.
[org.postgresql.util.PSQLException: ERROR: canceling statement due to statement timeout [statement:"/* FeedDAO.listThreadsByEntityLink */ SELECT json FROM thread_entity WHERE type = 'Conversation' AND resolved = false AND updatedAt < 9223372036854775807 AND hash_id in (SELECT fromFQNHash FROM field_relationship WHERE (:fqnPrefixHash IS NULL OR toFQNHash LIKE CONCAT(:fqnPrefixHash, '.%') OR toFQNHash=:fqnPrefixHash) AND fromType='THREAD' AND (:toType IS NULL OR toType LIKE CONCAT(:toType, '.%') OR toType=:toType) AND relation= :relation) AND (:userName IS NULL OR MD5(id) in (SELECT toFQNHash FROM field_relationship WHERE ((fromType='user' AND fromFQNHash= :userName) OR (fromType='team' AND fromFQNHash IN (:__teamNames_0))) AND toType='THREAD' AND relation= :filterRelation) )ORDER BY createdAt DESC LIMIT :limit", arguments:{positional:{1:table,2:11,3:3,6:-1}, named:{toType:table,filterRelation:-1,limit:11,userName:null,__teamNames_0:,fqnPrefixHash:b18959b82c335dd364f7fd8833fc8d16.54b56e667c796ad3c0d55c1774b754f7.6d3c5993ca017d0ff169b425d3193f02.9c0570fa882856fa23042e0cc0dc15ef,relation:3}, finder:[]}]] was thrown while processing request.
To Reproduce
Unfortunately, reproducing this issue requires having a large number of feeds in the database. However, here are some possible steps:
Create a large number of threads/entities with relationships to other threads/entities Run the listThreadsByEntityLink query with a large limit value (e.g. 10000)
Observe the error message and stack trace
Expected behavior
The query should be able to handle a large number of feeds without timing out.
Version:
- OS: Kubernetes
- OpenMetadata version: 1.5.6
- OpenMetadata Ingestion package version: 1.5.6
Additional context This issue only occurs when there are a large number of feeds in the database. The listThreadsByEntityLink query is used to retrieve related threads for a given entity link. It's possible that the query needs to be optimized or that the timeout value should be increased.