OpenMetadata icon indicating copy to clipboard operation
OpenMetadata copied to clipboard

Feed DAO query performance issue with high feed volume

Open elguero6 opened this issue 1 year ago • 0 comments

Affected module Backend Error (specifically FeedDAO)causing the UI unable to show feeds

Describe the bug

When there are a large number of feeds, the listThreadsByEntityLink SQL query times out and throws a PSQLException. The error message indicates that the statement is being canceled due to a statement timeout.

[org.postgresql.util.PSQLException: ERROR: canceling statement due to statement timeout [statement:"/* FeedDAO.listThreadsByEntityLink */ SELECT json FROM thread_entity WHERE type = 'Conversation' AND resolved = false AND updatedAt < 9223372036854775807 AND hash_id in (SELECT fromFQNHash FROM field_relationship WHERE (:fqnPrefixHash IS NULL OR toFQNHash LIKE CONCAT(:fqnPrefixHash, '.%') OR toFQNHash=:fqnPrefixHash) AND fromType='THREAD' AND (:toType IS NULL OR toType LIKE CONCAT(:toType, '.%') OR toType=:toType) AND relation= :relation) AND (:userName IS NULL OR MD5(id) in (SELECT toFQNHash FROM field_relationship WHERE ((fromType='user' AND fromFQNHash= :userName) OR (fromType='team' AND fromFQNHash IN (:__teamNames_0))) AND toType='THREAD' AND relation= :filterRelation) )ORDER BY createdAt DESC LIMIT :limit", arguments:{positional:{1:table,2:11,3:3,6:-1}, named:{toType:table,filterRelation:-1,limit:11,userName:null,__teamNames_0:,fqnPrefixHash:b18959b82c335dd364f7fd8833fc8d16.54b56e667c796ad3c0d55c1774b754f7.6d3c5993ca017d0ff169b425d3193f02.9c0570fa882856fa23042e0cc0dc15ef,relation:3}, finder:[]}]] was thrown while processing request.

To Reproduce

Unfortunately, reproducing this issue requires having a large number of feeds in the database. However, here are some possible steps:

Create a large number of threads/entities with relationships to other threads/entities Run the listThreadsByEntityLink query with a large limit value (e.g. 10000)

Observe the error message and stack trace

Expected behavior

The query should be able to handle a large number of feeds without timing out.

Version:

  • OS: Kubernetes
  • OpenMetadata version: 1.5.6
  • OpenMetadata Ingestion package version: 1.5.6

Additional context This issue only occurs when there are a large number of feeds in the database. The listThreadsByEntityLink query is used to retrieve related threads for a given entity link. It's possible that the query needs to be optimized or that the timeout value should be increased.

elguero6 avatar Oct 10 '24 21:10 elguero6