esp_littlefs icon indicating copy to clipboard operation
esp_littlefs copied to clipboard

LittleFS file caching

Open atanisoft opened this issue 3 years ago • 59 comments

I've encountered an odd case where the caching in LittleFS seems to be causing an issue that I'm not seeing on SD or SPIFFS.

Using a code pattern similar to below the second file handle will not see the data from the first file handle update:

uint8_t buf1[1] = {'a'};
uint8_t buf2[1] = {};
int fd1 = open("/fs/file.bin", O_RDWR);
assert(fd1 >= 0);
int fd2 = open("/fs/file.bin", O_RDWR);
assert(fd2 >= 0);
lseek(fd1, 0, SEEK_SET);
write(fd1, &buf1, sizeof(buf1));
fsync(fd1);
lseek(fd2, 0, SEEK_SET);
read(fd2, &buf2, sizeof(buf2));
assert(buf1[0] == buf2[0]);

Using two handles for the same file doesn't make a ton of sense in most cases, especially within the same function, but this pattern works on other FS types.

Any ideas or suggestions on config tweaks to reduce the chances of the two file handles being out of sync?

atanisoft avatar Nov 08 '20 13:11 atanisoft

investigating

BrianPugh avatar Nov 09 '20 02:11 BrianPugh

So I'm working on branch:

https://github.com/joltwallet/esp_littlefs/tree/multi-fd-sync

I added your code snippet as a unit test, and there is some non-determinism going on. It usually works the first time, but fails the second time. Trying to figure out if its a problem in my port, or if its an upstream issue.

BrianPugh avatar Nov 09 '20 04:11 BrianPugh

Interesting, is it using a cache per file handle or is it per file in the FS? If it is per handle, is it possibly the caches are getting out of sync?

atanisoft avatar Nov 09 '20 10:11 atanisoft

so within my port, the function lfs_file_open will be called twice, and two unique file descriptors will be returned. This means there are also two independent lfs_file_t structures. I'm looking through upstream lfs code right now, and I really can't figure out why this works sometimes. I can't figure out why the first time this code snippet is ran, the (relatively) independent seconds lfs_file_t structure gets updated with the new file length. Still looking.

BrianPugh avatar Nov 09 '20 16:11 BrianPugh

operating under the assumption that upstream littlefs cannot handle the same file being open multiple times (waiting to hear back from their authors), I can add this functionality to my port relatively easily. However, in the easy way to implement this, opening the same file multiple times would return the fame small-int-value file descriptor. Would this be acceptable? For example:

uint8_t buf1[1] = {'a'};
uint8_t buf2[1] = {};
int fd1 = open("/fs/file.bin", O_RDWR);     # lets say, returns value 3
assert(fd1 >= 0);
int fd2 = open("/fs/file.bin", O_RDWR);     # also returns 3
assert(fd2 >= 0);
lseek(fd1, 0, SEEK_SET);
write(fd1, &buf1, sizeof(buf1));
fsync(fd1);
lseek(fd2, 0, SEEK_SET);
read(fd2, &buf2, sizeof(buf2));
assert(buf1[0] == buf2[0]);

close(fd1);           # each descriptor still needs to be closed, or really "close" just needs to be called the same number of times on any of the descriptors
close(fd2);

BrianPugh avatar Nov 09 '20 16:11 BrianPugh

Perhaps track the lfs_file_t instance and return the same in your API as part of littlefs_vfs_open()? If that is what you are proposing I think it should be fine to return the same FD back to the caller (ESP VFS). The ESP VFS layer will assign a unique FD and translate it back to the VFS IMPL FD automatically.

I'd suggest tracking N usages so your VFS adapter holds onto the lfs_file_t until the last FD has been closed.

atanisoft avatar Nov 09 '20 17:11 atanisoft

that's exactly how I was going to implement it 👍 . I can work on that more this evening.

It seems simpler to have a count in the vfs_littlefs_file_t struct rather than having a unique fd and searching along the linked list if another instance of the same file is open. Actually, the more I think about it, it might not be too hard to just do this "properly" with unique file descriptors. I'll investigate that as well.

BrianPugh avatar Nov 09 '20 17:11 BrianPugh

Once you have something ready to test I can test it in my application and see how it performs. If you have a CAN-USB adapter for your PC and a CAN transceiver you can test it locally yourself possibly as well.

atanisoft avatar Nov 09 '20 17:11 atanisoft

@atanisoft I implemented (and merged) the lazy solution where the same small-integer FD is returned. It required significantly less code change, and doing it the "proper" way may have some performance impact on people that really push the file system. Let me know if this solution works for you (it passes the unit test). Thanks!

See: https://github.com/joltwallet/esp_littlefs/pull/19

BrianPugh avatar Nov 10 '20 03:11 BrianPugh

I put one comment on #19, but it won't impact my testing of the fix. I'll do some testing in the full setup and confirm there are no issues.

atanisoft avatar Nov 10 '20 12:11 atanisoft

This appears to have fixed the caching issue I was seeing, more work will be needed in the dependency which is opening the same file multiple times so that it can be consolidated to a single file handle being used for all flows but that will take more work.

atanisoft avatar Nov 10 '20 13:11 atanisoft

I can't see your comment on the PR, are you sure you submitted it?

Also, after sleeping on it, I realized that this fix doesn't totally solve your issue, unless your usecase is exactly reflected in that snippet. With the current solution, each individual file descriptor doesn't have its own position in the file. Thinking of a way around this...

BrianPugh avatar Nov 10 '20 16:11 BrianPugh

With the current solution, each individual file descriptor doesn't have its own position in the file. Thinking of a way around this...

Yeah, I was thinking of the same and was considering commenting on it as well.

I can't see your comment on the PR, are you sure you submitted it?

Somehow it got stuck in a Pending state...

atanisoft avatar Nov 10 '20 16:11 atanisoft

alright, I have a greater understanding of the upstream lfs codebase now, and I also realized my test would pass the first time due to artifacts left behind from a previous failed test. Now that I know nothing magical is going on behind the scenes (like the lfs file struct magically getting updated), i can tackle this a bit better.

BrianPugh avatar Nov 10 '20 17:11 BrianPugh

Alright here is my new attack plan:

  1. Create a new struct like:
typedef struct _ {
    lfs_file_t file;
    uint8_t open_count;
} vfs_littlefs_file_wrapper_t

With this new setup, every open gets a unique FD (as it should), but multiple FD's can be pointing back to the same vfs_littlefs_file_wrapper_t

  1. replace vfs_littlefs_file_t.file with a pointer to struct described in (1). Add a new attribute lfs_off_t pos into vfs_littlefs_file_t. This will keep track of this file descriptor's position in file.
  2. before every file operation, run lfs_file_seek(..., vfs_littlefs_file_t.pos, LFS_SEEK_SET) if is mismatches the lfs_file_t.pos attribute. This function just flushes some writes are in memory, and then sets the internal pos attrtibute, so it should be a very cheap operation.
  3. After every file operation, copy over lfs_file_t.pos to vfs_littlefs_file_t.pos.
  4. Like currently, on every close, decrement the open_count and deallocate the vfs_littlefs_file_wrapper_t when it reaches 0.

How does this plan sound?

Possible downsides:

  1. Each fopen now requires an additional malloc, which itself would increase memory usage per FD by 4bytes for the pointer, a few bytes (not sure how much overhead esp-idf's malloc has) for the additional malloc, and then 4 more bytes for the aligned open_count. All in all, lets say this would increase the memory overhead by 12 bytes per FD

BrianPugh avatar Nov 10 '20 18:11 BrianPugh

I think overall it sounds good but do add some locking of some sort to ensure no two threads are directly modifying the lfs_file_t concurrently.

atanisoft avatar Nov 10 '20 18:11 atanisoft

all the above operations would be done while the filesystem mutex is obtained, so there should be no issues.

BrianPugh avatar Nov 10 '20 18:11 BrianPugh

@X-Ryl669 do you have any thoughts on this? You have a keen eye for these sorts of things.

BrianPugh avatar Nov 10 '20 18:11 BrianPugh

Honestly, this is a lot of change for a barely invalid usage pattern (so it adds a lot potential bugs and multitask issues). If I were you, I would:

  1. If the filename are stored in the structures (depends on some build flags, IIRC)
  2. Prevent opening an already opened file by searching for it. It returns -1 if you try to open an already opened file and set errno to EBUSY. This means adding a hash check in the opened hash table (this is an acceptable cost, IMHO). No bug, no multitask issue. I would prefer this one.

Please notice that the file might not be opened with the same attributes (for example, once in read mode and later in read/write mode). In that case, you can't reuse the file descriptor nor the file_t object. What could you do in that case ? Close the file and reopen with the new mode (or a combined mode) ? If you do that, suddenly a read only file descriptor will be allowed to write!

So unless it's fixed upstream, there is no valid solution that's not a hack like the proposed implementation above, IMHO.

X-Ryl669 avatar Nov 11 '20 08:11 X-Ryl669

Note that this is a perfectly valid use case, multiple file handles on the same file from different threads. It works on other FS types (including SPIFFS).

But you are right, the multiple file handles would potentially have different mode flags which would need to be accounted for in the read/write implementations.

atanisoft avatar Nov 11 '20 10:11 atanisoft

I never had the need for this. Anyway, I don't think the HAL should be the right layer to implement this if the underlying driver does not, since in all cases, it'll lead to bugs that are almost impossible to solve here (I'm not even talking about partial write, truncate and all the other oddities) Hopefully, Brian already reported the bug to LittleFS developers, so it'll likely be fixed at the right place IMHO, that is: the filesystem code. Meanwhile, you can already query, in your code, for a existing file descriptor via its path and if it fails, open it.

X-Ryl669 avatar Nov 11 '20 11:11 X-Ryl669

I never had the need for this.

Yeah, this would definitely be considered a corner case.

I don't think the HAL should be the right layer to implement this if the underlying driver does not

I suspect the underlying FS code does support this but it may be dependent on using the same lfs_file_t or enforcing some sort of cache synchronization method.

Meanwhile, you can already query, in your code, for a existing file descriptor via its path and if it fails, open it.

If that is how this is implemented, LittleFS will no longer be an option for a number of projects I know of. If there was a way to disable or reduce the cache size that may be a better option than throwing an error if you attempt to open the same file in multiple code paths.

atanisoft avatar Nov 11 '20 12:11 atanisoft

I suspect the underlying FS code does support this but it may be dependent on using the same lfs_file_t or enforcing some sort of cache synchronization method.

No, this is not possible, since the open option can be different, and the current position in the file will have to be different, and so on (think about ftruncate for example).

If that is how this is implemented, LittleFS will no longer be an option for a number of projects I know of.

I don't know about your project. I understand how simpler it seems not to care about the file path when running open. Yet, this is a wormhole hiding in this simplicity, and I really doubt SPIFFS is acting correctly in that case (and for sure FatFS does not, even if it seems to work for dumb test case).

In your example, it seems to work but it'll break as soon as the operation happens in different task unless you have a big lock/mutex for all your FS operations, but it's not how ESP-IDF is implemented since it would really hurt performance.

This is because underlying write to the flash is not atomic once it touches multiple pages. So you might have situation like Task X writes "A..B..C" and Task Y will read "A.." or "A..B" or "A..B..C" or the unexpected "A.. ..C" since the writing the B page is not finished yet. What happens if one task seeks to B and the other task truncate to A ?

Honestly, I don't know about your software, but if it relies on atomic operation on the FS, I don't think it'll work correctly anyway on ESP-IDF, whatever the FS.

X-Ryl669 avatar Nov 11 '20 16:11 X-Ryl669

these are good points. Alright I think my action plan on this (for now) is to:

  1. Revert PR19 on master in the meantime, since its a half-ass attempt that doesn't really solve it.
  2. Continue working on PR 21. Its a mostly working solution right now, i just want to polish it up a bit and test it more. As @X-Ryl669 said, this won't properly handle modes. I could add that in in the same way I handled file position, but I also agree with the fact that these features shouldn't be added in my layer. That said, as I was going through the LFS codebase, i think the chances are slim that this is/will be properly supported at the littlefs level. I don't think files really know of each other at the root lfs layer.
  3. Reorganize (2) a little bit so we can enable/disable this feature.

BrianPugh avatar Nov 11 '20 17:11 BrianPugh

The issue of atomicity is not a problem in the code in using and is working correctly on SPIFFS and FatFS (SD) as well as on other platforms (Linux mainly).

I'll take a look at PR 21, as long as it allows the file to be opened multiple times it should be fine. It can be on the consumer code to ensure atomicity is preserved across tasks if/when needed.

atanisoft avatar Nov 11 '20 17:11 atanisoft

Hello, I left some comments in the littlefs thread (https://github.com/littlefs-project/littlefs/issues/483#issuecomment-727597870). TLDR: littlefs files are transactional in the sense that each open file gets an independent snapshot of the file's contents.

What is the use case for opening the same file multiple times? Is it primarily a usability thing to keep track of multiple file positions?

geky avatar Nov 15 '20 17:11 geky

@geky it's a small bit of a bug in a library I'm using but it works on all FS types except LittleFS from what I've tested so far. It would seem there should be a way to flush the caches even across multiple file handles so that one handle being written to can be read by another after a flush operation.

atanisoft avatar Nov 15 '20 17:11 atanisoft

So its hard to determine what the best path forward here is. There are multiple possible solutions, but they're all sort of hacky/weird.

BrianPugh avatar Nov 15 '20 17:11 BrianPugh

It is an exciting case where existing practices don't fit the embedded world well.

It's interesting to note that SPIFFS uses an internal file descriptor table: https://github.com/pellepl/spiffs/blob/9d12e8f47b0e611f277a0de8d39dc9089971ce20/src/spiffs.h#L385-L386

Also interesting to note Chan's FATFS prohibits this: http://elm-chan.org/fsw/ff/doc/appnote.html#dup

geky avatar Nov 15 '20 17:11 geky

Yeah, it is an interesting issue for sure. FatFS on esp32 works though when using an SD for storage as long as you flush after writes (something we are fixing in the library as well). But there will be a couple cases that will still open two handles to the same file for different purposes with access being from different threads.

atanisoft avatar Nov 15 '20 17:11 atanisoft