[Bug] streaming read by `from-timestamp` may be occur exception(snapshot file not fund) sometimes
Search before asking
- [X] I searched in the issues and found nothing similar.
Paimon version
0.8
Compute Engine
Flink
Minimal reproduce step
Analyze source code in SnapshotManager.class
when read earliest snapshot file at step2, the snapshot maybe expired and deleted
What doesn't meet your expectations?
query job will failed by throw FileNotFoundException
Anything else?
No response
Are you willing to submit a PR?
- [ ] I'm willing to submit a PR!
@Mr-j-yangyu Have you try to point to a safer timestamp to read (keep a margin from the oldest snapshot/changelog)?
@Mr-j-yangyu Have you try to point to a safer timestamp to read (keep a margin from the oldest snapshot/changelog)?
@Aitozi It is necessary to read earliest snapshot or changelog in some usage scenarios. Can add a logic to verify file exist when read earliest snapshot or changelog ?,just read next if not exist to reduce the probability of exception.
Hi, @Mr-j-yangyu Could you give the detailed minimal reproduce step? The code in this picture will get earliest+1 snapshot in the end.
Hi, @Mr-j-yangyu @discivigour I have found similar issue with SnapshotManager#earlierOrEqualTimeMills. An exception will be thrown complaining about the snapshot file does not exist. It is a bit difficult to reproduce because it only happens when the earliest snapshot expires between step1 and step 2 in the code picture. Is there anybody working on fixing this?
Hi, I tried to raise a fix for this problem in #4930. PTAL.