littlefs
littlefs copied to clipboard
Poor lfs_file_relocate() performance when a bad block is detected.
See also #535.
TLDR
lfs_file_relocate() copies data from a bad block to a new block one byte at a time. This is causing unpredictable lag (sometimes over 1500ms) when a block error is encountered while writing to a file.
Please consider:
- Change lfs_file_relocate() to use a dat[] array for bulk transfers similar to what lfs_bd_cmp() now does. The array size for these bulk transfers should be configurable by the user as either a #define in lfs.h or a parameter in lfs_config.
Full Background
I have been investigating unstable lfs_file_write() performance on a Cortex-M4 processor when streaming data in a background thread from a buffer into an lfs file on flash. Our flash chip is 1Gbit with 128k byte blocks composed of 64 2k byte pages.
Our process needs to reliably write out approximately ~4kbytes of data to flash every 500ms. Most of the time this takes 5-10ms to write out, but once in a while the buffer overflows because lfs takes more than 1000ms to write out the data. Profiling shows that flash page read and write operations reliably take 2ms or less to complete for this hardware and were not contributing to the problem.
It was discovered that whenever a bad block was found during lfs_file_write(), lfs_file_relocate() was called to copy the contents of the old block into a newly allocated block. On our hardware this operation takes as much as 1500ms to copy about 100kbytes of data. lfs_relocate_file() appears to be copying only one byte at a time between the read and write caches, causing a significant amount of overhead for this flash configuration.
After changing lfs_file_relocate() to use an array similar to what lfs_bd_cmp() now does, and setting that array to 32 bytes, lfs_file_relocate() now completes in 200-300ms after a block error.
I understand that lfs is trying to keep its stack footprint as low as possible and 128k blocks are a bit large, but I think the tradeoff between stack space and throughput in cases like this should be user configurable.
I can post a PR if that will help the discussion. Am I missing anything important here?
Edit: Actually a 256 byte buffer appears to be way overkill. Testing the same code with a 32 byte buffer yields nearly the same performance increase, so I have updated my description above.
If you have a PR with this change, I'd love to see it - I'd probably use it in my own version of littlefs where I'm seeing this same bottleneck come up occasionally
Making a PR...
PR is up in my fork here: https://github.com/littlefs-project/littlefs/pull/686/files
You might play with the value of LFS_BULK_XFER_SIZE, but more than 32 didnt really make it any faster for our use. Keep in mind that lfs appears to try to minimize stack usage, so that is also a factor.
NOTE: Currently working on a unit test failure on odd chunk sizes. Having trouble reproducing that issue in practice on my end...
Thanks, I just applied your changes to my branch. I haven't had a chance to gather hard numbers, but anecdotally it does seem to help.
I'd happily give up some stack space if it means I can reduce my buffer size (which currently has to be very large to account for the periodic spikes in littlefs latency)
@ryangardner Unit tests are now passing for my PR. We did detect a lesser performance issue in lfs_bd_cmp() but when we increase its buffer size larger than 8 bytes it causes unit tests to fail when rewriting files with odd block sizes. That doesnt happen in our application however. I'll get review feedback, but for now its working if you want to try it in your own fork.