DiskLruCache icon indicating copy to clipboard operation
DiskLruCache copied to clipboard

Silent commit failure due to ignore renameTo return value

Open TWiStErRob opened this issue 11 years ago • 7 comments

DiskLruCache.Editor editor = getDiskCache().edit(key);
try {
    File file = editor.getFile(0);
    new FileOutputStream(file).write(data);
    editor.commit();
} finally {
    editor.abortUnlessCommitted();
}

Now, commit calls completeEdit which has a line dirty.renameTo(clean); File.renameTo has a return value which is discarded. This may fail silently due to a clumsy way of writing the file (as seen above) and think that the commit was commited. However later when getDiskCache.get(key) is called it sees that the file is missing for the wrong reason.

The reason for renameTo failing is because there still a handle open for it in the process and the OS (Windows) may prevent the rename and return false. (side note: the strange thing is that in the past I've been able to rename running/and-locked .exe files without any problems from non-Java apps)

The line was mentioned in #67 where if you check the return value the operation will be still atomic and should throw an IOException as you do in https://github.com/JakeWharton/DiskLruCache/commit/4c31913192eae6ae6bf8a239e89844a0ab2455e7 which is a fix for #32.

Also there are other 3 File.delete calls which are not checked, mostly related to backup files but still worth checking the code (and intentionally ignoring it with a comment maybe).

This issue was discovered when trying to build: https://github.com/sjudd/glide/tree/3.0a

TWiStErRob avatar Aug 24 '14 20:08 TWiStErRob

Probably highly related that I can't mvn package your library on my Windows 7x64 with Java 1.7.0_05-b06:

Running com.jakewharton.disklrucache.DiskLruCacheTest
Journal compacted from 73 bytes to 57 bytes
Journal compacted from 40033 bytes to 97 bytes
Journal compacted from 14059 bytes to 64 bytes
Journal compacted from 20073 bytes to 65 bytes
DiskLruCache C:\Users\TWiStEr\AppData\Local\Temp\junit9046728808271492140\DiskLruCacheTest is corrupt: unexpected journal line: DIRTY   k1, removing
DiskLruCache C:\Users\TWiStEr\AppData\Local\Temp\junit295402612387673084\DiskLruCacheTest is corrupt: unexpected journal header: [libcore.io.DiskLruCache, 0, 2, ], removing
DiskLruCache C:\Users\TWiStEr\AppData\Local\Temp\junit470842294550096577\DiskLruCacheTest is corrupt: unexpected journal header: [libcore.io.DiskLruCache, 1, 2, ], removing
DiskLruCache C:\Users\TWiStEr\AppData\Local\Temp\junit1727753644610746780\DiskLruCacheTest is corrupt: unexpected journal header: [libcore.io.DiskLruCache, 1, 1, ], removing
DiskLruCache C:\Users\TWiStEr\AppData\Local\Temp\junit1120994322779557734\DiskLruCacheTest is corrupt: unexpected journal header: [libcore.io.DiskLruCache, 1, 2, x], removing
DiskLruCache C:\Users\TWiStEr\AppData\Local\Temp\junit247601122073956340\DiskLruCacheTest is corrupt: unexpected journal line: BOGUS, removing
DiskLruCache C:\Users\TWiStEr\AppData\Local\Temp\junit8567702200724664690\DiskLruCacheTest is corrupt: unexpected journal line: [0000x001, 1], removing
DiskLruCache C:\Users\TWiStEr\AppData\Local\Temp\junit4251017206806448957\DiskLruCacheTest is corrupt: unexpected journal line: [1, 1, 1], removing
Tests run: 60, Failures: 5, Errors: 5, Skipped: 0, Time elapsed: 17.94 sec <<< FAILURE!
Running com.jakewharton.disklrucache.StrictLineReaderTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 sec

Results :

Failed tests:   evictOnUpdate(com.jakewharton.disklrucache.DiskLruCacheTest): expected:<[8]L> but was:<[9]L>
  editSameVersion(com.jakewharton.disklrucache.DiskLruCacheTest): expected:<'a[2]'> but was:<'a[]'>
  editSnapshotAfterChangeAborted(com.jakewharton.disklrucache.DiskLruCacheTest): expected:<'a[2]'> but was:<'a[]'>
  updateExistingEntryWithTooFewValuesReusesPreviousValues(com.jakewharton.disklrucache.DiskLruCacheTest): expected:<'[C]'> but was:<'[A]'>
  readAndWriteOverlapsMaintainConsistency(com.jakewharton.disklrucache.DiskLruCacheTest): expected:<'[CCcc]'> but was:<'[AAaa]'>

Tests in error:
  editSinceEvicted(com.jakewharton.disklrucache.DiskLruCacheTest): failed to delete C:\Users\TWiStEr\AppData\Local\Temp\junit7772052166884171923\DiskLruCacheTest\a.0
  aggressiveClearingHandlesWrite(com.jakewharton.disklrucache.DiskLruCacheTest): Unable to delete file: C:\Users\TWiStEr\AppData\Local\Temp\junit6349605855950167385\DiskLruCacheTest\journal
  aggressiveClearingHandlesEdit(com.jakewharton.disklrucache.DiskLruCacheTest): Unable to delete file: C:\Users\TWiStEr\AppData\Local\Temp\junit650626927180218018\DiskLruCacheTest\journal
  aggressiveClearingHandlesPartialEdit(com.jakewharton.disklrucache.DiskLruCacheTest): Unable to delete file: C:\Users\TWiStEr\AppData\Local\Temp\junit3713838826338339862\DiskLruCacheTest\journal
  aggressiveClearingHandlesRead(com.jakewharton.disklrucache.DiskLruCacheTest): Unable to delete file: C:\Users\TWiStEr\AppData\Local\Temp\junit7743179828843848086\DiskLruCacheTest\journal

Tests run: 61, Failures: 5, Errors: 5, Skipped: 0

Maybe journal is not closed somewhere, or just a plain Windows bug?

TWiStErRob avatar Aug 24 '14 20:08 TWiStErRob

Strange. I don't have Windows to test on but your case seems valid for checking the return value.

JakeWharton avatar Sep 03 '14 03:09 JakeWharton

If the rename does fail, we're in an awkward place. We can't wait for the previous file to be closed (that could take forever) and we can't delete it either. We could mark the entry as being dead in the journal so that stale data isn't read, but that leaks disk.

The Windows file system model pretty much breaks DiskLruCache completely.

swankjesse avatar Sep 05 '14 11:09 swankjesse

I think as this library was made for Android, and Android is a Linux distribution, and Linux is a Unix system, you can expect it to work properly on any Unix system, but, unfortunately, nothing is guaranteed to Windows.

ghost avatar Sep 07 '14 21:09 ghost

I totally agree with that, but at the same time I should be able to build this library from sources... without failing tests anywhere where I have Java + Android SDK.

However ignoring a well documented error scenario is still a valid issue here. I never said I want to use it in a Windows app.

TWiStErRob avatar Sep 07 '14 22:09 TWiStErRob

Hi @TWiStErRob, could you test with the original version of DiskLruCache from Android Open Source Project in http://grepcode.com/file_/repository.grepcode.com/java/ext/com.google.android/android-apps/4.4.4_r1/com/example/android/bitmapfun/util/DiskLruCache.java/?v=source? Thanks in advance :smile:

ghost avatar Sep 08 '14 01:09 ghost

Hmm, I just realized how much @sjudd had modified the library, this invalidates my first code example in the issue.

The missing checks are still missing though, and I can't package DiskLruCache which probably won't be fixed even after the checks are there...

With the above file you sent I did the following (I'm not sure this is what you wanted):

  • clone this repo and replace DiskLruCache.java
  • comment out test that are not compiling (anything using journalBkpFile and setMaxSize)
  • comment out assertions that are not compiling (snapshot.getLength)
  • add @Test(timeout=10000) to some tests because they're just spinning (while (true))
    rebuildJournalOnRepeatedReads created a 14MB journal in 5 minutes, then I killed the java process

The output is very similar to what I've quoted above.

as @swankjesse said:

The Windows file system model pretty much breaks DiskLruCache completely.

Probably there's nothing much to do, except maybe adding the checks to have better error reporting if anything breaks randomly on Android/Unix (we can't expect .delete() and .renameTo() to return true all the time I guess). Same stands for the timeout= for those tests, they don't have a failure condition, they either pass or run forever.

TWiStErRob avatar Sep 08 '14 09:09 TWiStErRob