urlwatch icon indicating copy to clipboard operation
urlwatch copied to clipboard

Cache database doesn't always load the latest job state

Open cfbao opened this issue 5 years ago • 2 comments

Currently, cache database finds the latest state of a job by timestamp: int and tries: int. This can be unreliable. If a job experienced errors, and was quickly retried again with success, cache could record something like:

id guid timestamp data tries
1 ... 1544755012 ... 1
2 ... 1544755012 ... 2
3 ... 1544755012 ... 0

Ordering by timestamp then by tries would give the second row as the latest state, whereas in reality the third row is the latest.

Two possible solutions on my mind:

  1. Save timestamp as float rather than int. This should give enough precision in most cases, but I'm not sure if it's enough on all systems.
  2. Use id to order records. id seems to be strictly increasing, but this behavior is not documented in minidb, so I'm also a bit hesitant to use it.

In any case, tries should not be used for ordering, since it can reset to zero at any point.

This issue is usually not a problem in daily of urlwatch, but is unavoidable in tests, and must be resolved before tests can be improved.

cfbao avatar Dec 14 '18 02:12 cfbao

Seems that the behavior of id is not defined in minidb, but left to SQLite. Because the AUTOINCREMENT keyword is not used, there's no guarantee that id is always increasing, so option 2 isn't a viable choice.

cfbao avatar Dec 14 '18 17:12 cfbao

It also turns out that using float type timestamp doesn't give enough precision in testing even on my own computer. I guess I'll have to manually put in some wait time in tests to guarantee success.

I don't expect it to be a real issue in normal usage though.

Edit: a better solution probably involves a redesign of the cache database. Timestamp can still be int, but another table would store the id of the latest snapshot of a particular job.

cfbao avatar Dec 14 '18 18:12 cfbao