flyte icon indicating copy to clipboard operation
flyte copied to clipboard

[BUG] Performance degradation with datacatalog.tags table

Open andrew-freenome opened this issue 2 years ago • 6 comments

Describe the bug

I am seeing a performance bottleneck with the Flyte database. With my workload, the query SELECT * FROM "tags" WHERE ("tags"."artifact_id","tags"."dataset_uuid") IN (($1,$2)) is getting executed frequently (380,000 times in the last day) against the datacatalog database. The workload I'm executing does have ~380k tasks, so the number of queries makes sense. On average, it takes 2 seconds to complete and returns 0 rows. I believe this query is executed as part of the task cache, which I do have enabled (but I expect all cache misses). I am using v1.0.1 of Datacatalog, and v1.1.47 of FlytePropeller. This is a database with 32 vCPU and 64GB of memory, 200GB storage with ~50GB of that used. Screenshot 2023-12-05 at 12 36 55 PM

Expected behavior

I expect that the database would not be a performance bottleneck, and that the datacatalog.tags table is properly indexed in order to support the queries that are executed against it.

Additional context to reproduce

If I add an index (CREATE INDEX tags_dataset_uuid_artifact_id_idx ON tags (dataset_uuid, artifact_id);), the query gets significantly faster (1000x in my local testing). The postgres planner seems to vary the way it actually executes the query. The attached screenshots are from a different DB instance under slightly lesser load, but I'm adding them to show the 3 different ways I've seen the planner choose to execute the query. (Note that these are all without the index).

Context: https://flyte-org.slack.com/archives/CP2HDHKE1/p1701797879836439

Screenshots

Screenshot 2023-12-06 at 2 03 33 PM Screenshot 2023-12-06 at 2 03 45 PM Screenshot 2023-12-06 at 2 03 57 PM

Are you sure this issue hasn't been raised already?

  • [X] Yes

Have you read the Code of Conduct?

  • [X] Yes

andrew-freenome avatar Dec 12 '23 17:12 andrew-freenome

Thank you for opening your first issue here! 🛠

welcome[bot] avatar Dec 12 '23 17:12 welcome[bot]

🤖

Hello @andrew-freenome! I'm Dosu, a friendly bot here to assist you with your issues, answer your queries and help you contribute to the Flyte community. While we're waiting for a human maintainer, I'll be working on your issue about the performance degradation with the datacatalog.tags table. I'll get back to you with a comprehensive answer shortly. Hang tight!

dosubot[bot] avatar Dec 12 '23 17:12 dosubot[bot]

@andrew-freenome , we're investing in a much simpler and more performant implementation of datacatalog. It's bound to show up in Q124.

eapolinario avatar Dec 21 '23 22:12 eapolinario

Would an MR that created the missing index be welcomed, or since the alternate implementation you mentioned will obviate the need for it, would it not get merged?

andrew-freenome avatar Dec 21 '23 22:12 andrew-freenome

Is there any update on this issue? A fix was predicted for Q1 2024 but we haven't seen any updates in the release notes about datacatalog. Thanks!

annadcunningham avatar May 06 '24 21:05 annadcunningham

@annadcunningham , unfortunately this project had to be de-prioritized.

@andrew-freenome , would you be willing to contribute this change to create the missing index?

eapolinario avatar May 07 '24 00:05 eapolinario