Add support for cache TTL
Tracking issue
Closes #3305
Why are the changes needed?
Flyte data may be stored in S3 buckets that have retention policies. If those retention policies delete the underlying data in S3 the entries in Flyte data catalog are stale and point to nonexistent buckets. This results in errors at task runtime in flytekit.
What changes were proposed in this pull request?
This pull request modifies Flyte data catalog such that artifacts in the catalog can expire and no longer be valid. The general approach is to allow for multiple rows with the same artifact key but to only retain one non-expired row ideally. I chose to allow for multiple rows to exist so that cleanup of expired artifacts could be done asynchronously. Also, there are multiple tables that hold references to the artifact table and ensuring these dependents were also expired seemed challenging.
Data Model
- A new column is added to the artifacts table,
expires_at. This dictates when the artifact is no longer valid. - The composite primary key for artifacts was broadened to include
created_atsuch that there can be multiple rows with the same artifact key (ideally with one and only one non-expired entry if insertions are done carefully and atomically). - The compose primary kettle for tag was breaded to include
artifact_idsuch that multiple tags can exist with the same artifact key but can only point to one artifact.
Data Layer
- The data layer was updated to filter expired artifacts in get/list operations, this propagates to tags as well.
- The data layer was updated such that non-expired artifacts are searched before creating new ones.
Manager Layer
- The TTL is used with a pluggable clock to generated the
expires_atvalue for records.
Propeller
Updated propeller to plumb the cache TTL from the task metadata to Flyte data catalog API calls.
Future Updates
At a high level I'd like to follow up on this work with a background job of some sort that queries the DB for expired cache entries and then thoroughly cleans them up by deleting the contents if possible and then later removing the rows from Postgres. Similar to what is done in https://github.com/flyteorg/flyte/pull/4655
I realize flytekit changes will need to follow up here as well but I think those will be relatively trivial and this change is entirely backwards compatible.
How was this patch tested?
Unit tests, for now.
Labels
Please add one or more of the following labels to categorize your PR:
- added: For new features.
- changed: For changes in existing functionality.
- deprecated: For soon-to-be-removed features.
- removed: For features being removed.
- fixed: For any bug fixed.
- security: In case of vulnerabilities
This is important to improve the readability of release notes.
Setup process
Screenshots
Check all the applicable boxes
- [ ] I updated the documentation accordingly.
- [x] All new and existing tests passed.
- [x] All commits are signed-off.
Related PRs
Docs link
Summary by Bito
This pull request enhances the Flyte data catalog by introducing support for cache TTL, allowing artifacts to expire and preventing stale entries that could lead to runtime errors. Key modifications include the addition of an 'expires_at' column in the artifacts table and updates to the data layer for filtering expired artifacts. The implementation is designed to be backward compatible and includes unit tests to validate the new functionality.
Code Review Agent Run Status
- Limitations and other issues: ❌ Failure - Bito Automatic Review Skipped - Draft PR
Bito didn't auto-review because this pull request is in draft status.
To trigger review, mark the PR as ready or type/reviewin the comment and save.
You can change draft PR review settings here, or contact the agent instance creator at [email protected].
Codecov Report
Attention: Patch coverage is 87.64045% with 11 lines in your changes missing coverage. Please review.
Project coverage is 58.53%. Comparing base (
a75cea0) to head (e44b246). Report is 5 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #6451 +/- ##
==========================================
+ Coverage 58.48% 58.53% +0.04%
==========================================
Files 940 940
Lines 71584 71608 +24
==========================================
+ Hits 41867 41915 +48
+ Misses 26534 26516 -18
+ Partials 3183 3177 -6
| Flag | Coverage Δ | |
|---|---|---|
| unittests-datacatalog | 59.69% <90.36%> (+0.65%) |
:arrow_up: |
| unittests-flyteadmin | 56.26% <ø> (ø) |
|
| unittests-flytecopilot | 30.99% <ø> (ø) |
|
| unittests-flytectl | 64.72% <ø> (ø) |
|
| unittests-flyteidl | 76.12% <ø> (ø) |
|
| unittests-flyteplugins | 61.08% <33.33%> (+0.13%) |
:arrow_up: |
| unittests-flytepropeller | 54.78% <66.66%> (+<0.01%) |
:arrow_up: |
| unittests-flytestdlib | 64.04% <ø> (ø) |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Code Review Agent Run Status
- Limitations and other issues: ❌ Failure - Bito Automatic Review Skipped - Draft PR
Bito didn't auto-review because this pull request is in draft status.
To trigger review, mark the PR as ready or type/reviewin the comment and save.
You can change draft PR review settings here, or contact the agent instance creator at [email protected].
Code Review Agent Run Status
- Limitations and other issues: ❌ Failure - Bito Automatic Review Skipped - Draft PR
Bito didn't auto-review because this pull request is in draft status.
To trigger review, mark the PR as ready or type/reviewin the comment and save.
You can change draft PR review settings here, or contact the agent instance creator at [email protected].
Code Review Agent Run Status
- Limitations and other issues: ❌ Failure - Bito Automatic Review Skipped - Draft PR
Bito didn't auto-review because this pull request is in draft status.
To trigger review, mark the PR as ready or type/reviewin the comment and save.
You can change draft PR review settings here, or contact the agent instance creator at [email protected].
Code Review Agent Run Status
- Limitations and other issues: ❌ Failure - Bito Automatic Review Skipped - Draft PR
Bito didn't auto-review because this pull request is in draft status.
To trigger review, mark the PR as ready or type/reviewin the comment and save.
You can change draft PR review settings here, or contact the agent instance creator at [email protected].
Code Review Agent Run Status
- Limitations and other issues: ❌ Failure - Bito Automatic Review Skipped - Draft PR
Bito didn't auto-review because this pull request is in draft status.
To trigger review, mark the PR as ready or type/reviewin the comment and save.
You can change draft PR review settings here, or contact the agent instance creator at [email protected].
Code Review Agent Run Status
- Limitations and other issues: ❌ Failure - Bito Automatic Review Skipped - Draft PR
Bito didn't auto-review because this pull request is in draft status.
To trigger review, mark the PR as ready or type/reviewin the comment and save.
You can change draft PR review settings here, or contact the agent instance creator at [email protected].
From reading you description it sees expired at is a better primary key member
From reading you description it sees expired at is a better primary key member
Yeah that would work as well. Just need to widen that composite key with something.