ekv icon indicating copy to clipboard operation
ekv copied to clipboard

Panic when reading large chunk sizes

Open lulf opened this issue 10 months ago • 3 comments

After the max chunk size feature, there is an issue where the PageWriter can still write chunks that are larger than max chunk size.

I attempted to fix this here https://github.com/embassy-rs/ekv/compare/write-max-chunk-size?expand=1 - but I don't think it is a good one, because it would waste a whole page per chunk.

Looking at FileWriter::write, it looks like it will just call write() in a loop, but it will not write the chunk header between the writes.

One solution could then be to commit each chunk in the FileWriter::write() loop, or build this into the PageWriter::write() itself. I'm not sure what the best solution is, or if calling commit() inside the write will somehow break some atomicity properties.

lulf avatar Apr 26 '24 09:04 lulf

Yeah, I hit that issue too, need to revert back to 4095 max chunk size. I don't yet have experience with the internals of ekv but could ekv use page partly and fill it up as new chunks are written? Example: page-size-4096 max-chunk-size-256 a page can hold 16 chunks in that case. Since we know the erase value we can erase a page and write a single chunk to it, lest call index 1. Now next chunks of index 2-4 are written, those are add to the already used page by chunk 1. When chunk is overwritten it is moved to new fresh page and the first chunk slot of first page is marked as "trashed". A page is free up only when all chunks slots in it are marked as "trashed" with means it can be erased. Not sure if that makes sense or even fits into ekv without rewriting it from scratch. The flash I am using does allow writing 256 slots of a page 4096 (that is the erase size of it).

dragonnn avatar Apr 26 '24 12:04 dragonnn

if calling commit() inside the write will somehow break some atomicity properties.

yes, PageWriter needs to ensure nothing is persisted until you .commit(). This is why write() writes data, but not the chunk header until .commit().

One way to ensure that while still allowing splitting chunks is:

  • Leave hole for 1st chunk header
  • Write 1st chunk's data.
  • When the 1st chunk is full, "save" the info of the chunk header to a field in PageWriter. Do not write it yet.
  • Leave hole for 2nd chunk header
  • Write 2nd chunk's data.
  • When the 2nd chunk is full, write the 2nd chunk header.
  • Leave hole for 3rd chunk header
  • Write 3rd chunk's data.
  • When the 3rd chunk is full, write the 3rd chunk header.
  • Leave hole for 4th chunk header
  • Write 4th chunk's data.
  • On commit, write chunk header for the 4th chunk first, THEN write the header for the 1st chunk.

So, before commit: [hole for 1st header] [1st data] [2nd header] [2nd data] [3rd header] [3rd data] [hole for 4th header] [4th data]

If power fails at this point, a PageReader will stop before the 1st chunk, when it sees the 1st hole. So it's fine to have valid chunk headers later, those will be ignored. (note in the case of appending to an existing page, it's possible that there's valid chunks before the 1st chunk. we want these to stay valid, and no newly-written but not-yet-committed chunks to become valid).

It's important on commit to write the last chunk header first. Otherwise it could end up like this if power fails on commit: [1st header] [1st data] [2nd header] [2nd data] [3rd header] [3rd data] [hole for 4th header] [4th data]. ie data for chunks 1-3 is commited, but not 4. We shouldn't allow this, either all or no data must get committed.

And, after commit we get: [1st header] [1st data] [2nd header] [2nd data] [3rd header] [3rd data] [4th header] [4th data]. All data committed.

This requires a bounded amount of RAM: just the memory for the 1st chunk.

Dirbaio avatar Apr 26 '24 12:04 Dirbaio

Actually, maybe it's possible to change things in higher layers to avoid the "PageWriter needs to ensure nothing is persisted until you .commit()." requirement. Tthat'd make PageWriter simpler: it'd become just "writ.e data, write header, write data, write header...". no need for "save the 1st chunk header for last".

This is the only place that requires the "PageWriter needs to ensure nothing is persisted until you .commit()." guarantee: https://github.com/embassy-rs/ekv/blob/3ce370e7e5e5a904e95f52bedaaa3352cf238b22/src/file.rs#L659-L665

that's guaranteed to write max FILE_COUNT * FileMeta::SIZE bytes. If that's <= MAX_CHUNK_SIZE then we're guaranteed that write is atomic even if longer writes aren't.

I don't like that this puts a "floor" on MAX_CHUNK_SIZE that goes up with flash memory size... but it does so logarithmically while we already have things that go up linearly (the page allocator...) so perhaps it's okay.

Dirbaio avatar Apr 26 '24 13:04 Dirbaio

fixed in #9

Dirbaio avatar May 02 '24 14:05 Dirbaio