flyte icon indicating copy to clipboard operation
flyte copied to clipboard

[Core feature] FlyteAdmin Continues Reading from Old S3 Bucket After Migration, Causing 500 Errors in UI

Open daadc opened this issue 6 months ago • 3 comments

Motivation: Why do you think this is important?

After migrating Flyte’s underlying S3 storage from bucketA (endpointA) to bucketB (endpointB), FlyteAdmin and the UI still attempt to read from the old bucket, resulting in 500 errors when clicking on task executions.

Initial error when launching tasks:

container:bucketA != Passed Container:bucketB. Dynamic loading is disabled: not found → Resolved by enabling “multi‐container mode.”

New error in Flyte UI logs:

{"level":"error","msg":"Failed to read from the raw store [s3://bucketA/metadata/.../test_final_0603_3] Error: not found"} FlyteAdmin appears to still be using the old bucketA configuration.

Goal: What should the final outcome look like, ideally?

After updating the S3 bucket configuration to bucketB and migrating all objects:

FlyteAdmin (and Propeller) should read from and write to s3://bucketB/... No 500 errors or “not found” logs referencing s3://bucketA/... in the UI.

Describe alternatives you've considered

  1. update db Image
  2. storage.enable-multicontainer=true

Propose: Link/Inline OR Additional context

Flyte Version: before Migration == v1.13.3 | after mirgration == v1.15.3

Are you sure this issue hasn't been raised already?

  • [x] Yes

Have you read the Code of Conduct?

  • [x] Yes

daadc avatar Jun 06 '25 11:06 daadc

It seems likely that FlyteAdmin and the UI are still referencing the old S3 bucket (bucketA) because metadata in the Flyte database (such as URIs for task inputs/outputs) still points to bucketA, even after updating the storage configuration to bucketB. According to clarifying notes about the data lifecycle, Flyte stores metadata (including references to raw data) in its database, and changing the storage config does not automatically update these references. As a result, FlyteAdmin will continue to attempt to access the old bucket for historical executions, leading to "not found" errors and 500s in the UI.

My understanding is that simply enabling multi-container mode or updating the storage config is not sufficient; you would need to update all metadata references in the database to point to the new bucket, or ensure that the old bucket remains accessible for historical data. There is no mention in the data sources of an automated migration tool for updating these references, so manual intervention (such as a database update or maintaining both buckets) is required to avoid errors for past executions. Would you like more detail on possible migration strategies or database update approaches?

Sources:


This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.

Mention @runllm to ask follow-up questions. See runllm.com for more!

runllm[bot] avatar Jun 06 '25 11:06 runllm[bot]

#take

machichima avatar Jun 09 '25 00:06 machichima