DataTanker
DataTanker copied to clipboard
RecoveryFile is horribly broken
So, I was curious how the recovery file stuff was actually implemented, and found some grave bugs which completely disable the feature always, and if it wasn't, would actually cause more data corruption.
For starters CorrectlyFinished
is always false, because it tries to read beyond the stream end (Stream.Seek(finalRecordSize, SeekOrigin.End);
instead of -finalRecordSize
).
https://github.com/VictorScherbakov/DataTanker/blob/ac533a63f16800a0aa4ace3db8479f892efa09df/DataTanker/Core/Recovery/RecoveryFile.cs#L235
Next, it does not actually read the content of update records:, as it calls the BlockingRead()
variant that expects a size and returns the buffer (and ignores the return value), so content
is never filled, thus potentially "updating" a page with all zeroes.
https://github.com/VictorScherbakov/DataTanker/blob/ac533a63f16800a0aa4ace3db8479f892efa09df/DataTanker/Core/Recovery/RecoveryFile.cs#L318-L319
Finally, replaying the recovery records will fail, as the recovery is attempted before Storage.IsOpen
is set to true
, thus when tying to e.g. update a page a chain of PageExists -> CheckStorage -> not open -> throw new InvalidOperationException
occurs, and boom.
Fixing these issue should at least make the basic recovery functionality work.
I added a test to my local fork (that has Recovery renamed to Journal, among other things), that you maybe can adapt.
[TestFixture]
public class JournalTests : FileSystemStorageTestBase
{
public JournalTests()
{
StoragePath = Path.Combine(Directory.GetParent(Assembly.GetExecutingAssembly().Location).FullName,
"..\\..\\Storages");
}
[Test]
public void RecoverJournal()
{
long idx;
long deleteIndex;
using (var man = new FileSystemPageManager(4096))
using (var storage = new Storage(man)) {
storage.Open(StoragePath, PersistentStorageOpenMode.Create);
var page = man.CreatePage();
idx = page.Index;
deleteIndex = man.CreatePage().Index;
}
using (var man = new FileSystemPageManager(4096))
using (var storage = new Storage(man)) {
storage.Open(StoragePath, PersistentStorageOpenMode.OpenExisting);
var page = man.FetchPage(idx);
Assert.AreEqual(page.Index, idx);
Assert.AreNotEqual(page.Content[10], 0xff);
Assert.True(man.PageExists(deleteIndex));
}
string file;
using (var man = new FileSystemPageManager(4096))
using (var storage = new Storage(man))
using (var journal = new JournalFile(man, man.PageSize)) {
storage.Open(StoragePath, PersistentStorageOpenMode.OpenExisting);
var page = man.FetchPage(idx);
idx = page.Index;
man.Dispose();
Assert.AreEqual(page.Index, idx);
Assert.AreNotEqual(page.Content[10], 0xff);
journal.WriteDeletePageRecord(deleteIndex);
// Update twice with different pages to make sure the latest one wins
page.Content[9] = 0xaa;
page.Content[10] = 0xaa;
journal.WriteUpdatePageRecord(page);
page.Content[10] = 0xff;
journal.WriteUpdatePageRecord(page);
journal.WriteFinalMarker();
file = journal.JournalFileName;
}
Assert.True(File.Exists(file));
Assert.GreaterOrEqual(new FileInfo(file).Length, 1);
Assert.AreNotEqual(idx, 0);
using (var man = new FileSystemPageManager(4096))
using (var storage = new Storage(man)) {
storage.Open(StoragePath, PersistentStorageOpenMode.OpenExisting);
var page = man.FetchPage(idx);
Assert.AreEqual(page.Index, idx);
Assert.AreEqual(page.Content[9], 0xaa);
Assert.AreEqual(page.Content[10], 0xff);
Assert.False(man.PageExists(deleteIndex));
Assert.AreEqual(man.CreatePage().Index, deleteIndex);
}
}
}
One more bug:
Consider the following chain of events:
- Page removed (Delete record written)
- Page resurrected
- Page updated (Update record written)
When this is replayed from the recovery file, the logic will first update the page (as page updates will be applied first) then the page gets deleted.
A potential fix is in ReadAllRecords
when an update record is discovered, it should drop the page index from DeletedPageIndexes
again, analog to dropping earlier Update records after a Delete record is encountered.
Great contribution as always!
RecoveryFile really lacked tests. Thank you for the first one! I hope this function works properly now.