littlefs Feature/non empty dir removal

For background, see #43. I figured we could add logic easily that prevented the orphan issue that @geky mentioned by scanning for files only in the folder, so this commit adds that logic. E.g. you can remove a folder so long as it doesn't contain other folders with one commit to the filesystem permitting several files to be removed at once.

May 12 '21 22:05 tim-nordell-nimbelink

One thing I wasn't sure about is the lfs_fs_preporphans(...) count added/removed. Technically we're removing multiple "blocks" at once from the allocated filesystem with this change so I wasn't sure if the lfs_fs_preporphans should reflect that, or if it's supposed to reflect the number of folders. I'm assuming not since I don't think lfs_dir_split(...) touchs lfs_fs_preporphans(...) at all.

I did miss updating the global state from each of the split pages initially, but that has been added in with the new commit block. I also had the linked list of directories slightly wrong, and propagated this up a hair.

May 13 '21 21:05 tim-nordell-nimbelink

I guess not actually testing it in the compile environment within littlefs was a mistake, nor running the test suite. It's fixed now. :) (I only tried it in my project on my embedded platform which had different compiler settings.)

Jun 11 '21 01:06 tim-nordell-nimbelink

Sorry about the delay between CI jobs, the new "workflow approval for first time contributors" requirement means I need to manually allow the CI to run, which I unfortunately probably won't be able to do promptly.

As far as I'm aware there's no way to give you CI permissions, that would be awfully convenient :/

It looks like a "reentrant" test is failing, which needs the -r flag (these test power cycling and take longer, make test TESTFLAGS+='-r' or ./scripts/test.py -r). This does suggest something might be going wrong with the orphan count, since "orphans" can only happen under powerloss.

Jun 12 '21 18:06 geky

I'm assuming not since I don't think lfs_dir_split(...) touchs lfs_fs_preporphans(...) at all.

Ah, lfs_dir_split is a bit special. Orphans are metadata-pairs that no longer belong to a directory, and when we split we can create the metadata-pair and put it in the directory's linked-list at the same time, so creating an orphan is not possible.

We need the orphan state for other operations, such as remove, where we need two commits to remove the metadata-pair from both the directory tree and the metadata linked-list.

For example if we remove C here:

          .----.
          | A  |-.
.---------|    |-'
|         '----'
|    .-----'| '-----.
|    v      v       v
| .----.  .----.  .----.
'>| B  |->| C  |->| D  |
  |    |  |    |  |    |
  '----'  '----'  '----'

Technically we're removing multiple "blocks" at once from the allocated filesystem with this change so I wasn't sure if the lfs_fs_preporphans should reflect that

lfs_fs_preporphans is a bit misleading here, on disk there is a single bit to say "has orphans" vs "does not have orphans". This is to avoid an expensive deorphan scan which usually is not needed. The deorphan scan keeps searching for orphans until it is sure there is no more orphans on the metadata-pair linked-list (here).

The reason there is a counter on the driver side is to keep track of how many orphans have been created by different operations so we know if we need to update the orphan bit in lfs_dir_commit.

Wear-leveling/bad-block removal also creates orphans, and can happen on any commit, so this gets a bit messy.

You will need to set the orphan bit, but you don't actually need to write the exact number to the disk, so you could "+1" for multiple and, and "-1" once you have removed the blocks from the linked-list. There is an assert in preporphans (here) that should help protect against mistakes here.

Jun 12 '21 19:06 geky

Sorry about the delay between CI jobs, the new "workflow approval for first time contributors" requirement means I need to manually allow the CI to run, which I unfortunately probably won't be able to do promptly.

As far as I'm aware there's no way to give you CI permissions, that would be awfully convenient :/

No worries - I saw that you had to manually run the CI. I wonder if you had pulled the branch into a temporary branch in the official repository if it'd then consider me to be a contributor, or if I need to be in the master history?

It looks like a "reentrant" test is failing, which needs the -r flag (these test power cycling and take longer, make test TESTFLAGS+='-r' or ./scripts/test.py -r). This does suggest something might be going wrong with the orphan count, since "orphans" can only happen under powerloss.

I'll check out these tests cases, thanks.

Jun 12 '21 22:06 tim-nordell-nimbelink

It looks like the branch doesn't work, I'm guessing it needs to be on master since this is what GitHub uses to update the contributors graph.

https://github.com/littlefs-project/littlefs/tree/tim-nordell-nimbelink-perm

Unfortunately they really don't have much documentation over this badge since it originally was just decorative.

Jun 14 '21 12:06 geky

Hi @geky -

It looks like a "reentrant" test is failing, which needs the -r flag (these test power cycling and take longer, make test TESTFLAGS+='-r' or ./scripts/test.py -r). This does suggest something might be going wrong with the orphan count, since "orphans" can only happen under powerloss.

Got it - I didn't realize there were additional tests hidden. I figured out the problem - I was walking the splits for the gstate, but I forgot to xor as I went along. (The fix is here for the test case that was failing: https://github.com/littlefs-project/littlefs/pull/557/commits/45d02e5750557a7646311c383eb6941bf429a187.) I haven't squashed the fixups, but I figured I'd leave them in there this time in case we didn't want any of those additional changes.

I moved the core logic into a helper function, and am passing that as a pointer down into the sub routines. I wasn't sure if you'd want that minor code cost always there or not - e.g. this way it can implicitly be dropped by the linker if a user doesn't use the 2 new user-facing routines.

I haven't added any recursive top-level routines as-of-yet, but I think any recursion due to the linked list nature needs to be done at a layer above what I implemented. However, any recursive routine can now be a bit faster than it could have been previously if added at some point.

It looks like the branch doesn't work, I'm guessing it needs to be on master since this is what GitHub uses to update the contributors graph.

Oh well, I figured it might be worth a shot.

Would you like me to squash the fixups? Anything you'd like changed in the changeset? All of the current test cases now pass (at commit 45d02e5 which is before I split off the function name). I wasn't sure if you wanted any of the test cases moved to the new function names or not, so none of the tests have been changed. I did make a duplicate of the test file that had failed that invokes the removeall function instead.

Thanks, Tim

Jun 14 '21 12:06 tim-nordell-nimbelink

I just realized there are some stylistic things I need to conform to the codebase too. For instance, I'm usually a little more verbose and have { on new lines, so I am realizing I didn't quite conform to the styles of the codebase. I'll get those fixed up later today at some point.

Jun 14 '21 12:06 tim-nordell-nimbelink

Maybe lfs_atomic_remove(...) and lfs_atomic_rename(...)? The base ones are atomic (if I understand it correctly), too, but we just handle a wider range of use-cases with this changeset.

Incidentally, in Linux there is a "renameat2(...)" which apparently accepts a flag to exchange the location of two folders with RENAME_EXCHANGE; not quite what is implemented here.

Jun 14 '21 14:06 tim-nordell-nimbelink

Hi @geky -

I added the lfs_remove_recursive(...) feature. So there are 3 new API functions overall:

lfs_atomic_remove(...)
lfs_atomic_rename(...)
lfs_remove_recursive(...)

The lfs_atomic_*(...) ones do not support folders with nested items in them, but the lfs_remove_recursive(...) one does. If I'm misunderstanding that removing a folder without nested folders is "atomic" from the filesystem perspective (e.g. after reboot you end up with either it not removed, or it completely removed), then it doesn't make sense to differentiate between these. I haven't thoroughly tested the lfs_remove_recursive(...) variant which is introduced in its own commit. I did add a minimal test case for this though into the test suite, but it could probably be more extensively tested.

The lfs_remove_recursive(...) will be a little more efficient than doing it externally outside of the filesystem code since it can walk the tags within a folder and decide to recurse into them without having to reload the whole tree up to that (and it doesn't require looking at the tag's name).

Thanks, Tim

Jun 14 '21 21:06 tim-nordell-nimbelink

I tested this code today, and wow, amazing. Thank you. Hope it gets added. We accumulate up to 20000 small json files in a folder in a few days, and deleting these with unlink one by one takes over 9.5 hours. Deleting the folder with with 5000 records using rmdir_atomic(), only took me 4040ms. I had to modify the esp_littlefs port I'm using on the ESP32 WROOM 32 E

Jun 16 '21 17:06 wreyford

@wreyford - Glad you found it useful. We have something similar were we can accumulate several small files and I needed it to unlink more quickly, too.

I did find my recursive removal function is broken, so that's why I marked this back to a draft. The atomic_remove(...) should be fine, though, and maybe I should just leave the PR for those before a recursive solution is added in properly.

Jun 16 '21 18:06 tim-nordell-nimbelink

@geky and @wreyford -

The lfs_remove_recursive(...) routine is fixed. I suppose a future change for that would be to remove the top-level folder (and marking it as an orphan), and then go through and systematically remove the inner links from the linked list directory structure like it does. That'd probably allow the whole sub-folder to be atomically removed rather than as it is being done now.

I'm pretty happy about the current state of the code for this changeset. I refactored it a tad so that the common states for a directory removal are held in a structure, and that single structure is pushed around to the helpers rather than all of the constituent components as arguments. That should reduce the code overhead, too, of calling the helper functions.

It'd probably be easier to review the individual commits as they move things through:

The first commit in the series moves to using the common structure
The second commit pulls out some common code from remove/rename around the usage of that structure
The third commit adds the atomic_remove/atomic_rename
The fourth commit adds the remove_recursive(...) functionality

It's easier to see the refactoring in the first two commits instead of looking at the overall series as a changeset.

-- Tim

Jun 17 '21 00:06 tim-nordell-nimbelink

I realized I had some redundant code in the lfs_remove_recursive(...) codepath. It's consolidated now.

Jun 17 '21 15:06 tim-nordell-nimbelink

Hmm, interesting that the global state is broken for arm but not for x86_64. They're both little endian architectures, so... I'll see if I can figure it out.

Jun 21 '21 14:06 tim-nordell-nimbelink

Hi @geky -

Doing some testing locally, and it appears that my assumption in some of the code was wrong (based on function names). I had assumed that the function lfs_dir_getgstate(...) set the output array, but it does an xor into the output array. So this code pattern was wrong:

static int lfs_dir_prep_remove_nonempty_folders(lfs_t *lfs, struct lfs_remove_state *p) {
    lfs_gstate_t split_gstate;
...
            // Before we fetch the next block, update our fetched gstate xor
            lfs_dir_getgstate(lfs, &p->dir.m, &split_gstate);
            lfs_gstate_xor(&p->gstate, &split_gstate);
...
}

It should have been:

static int lfs_dir_prep_remove_nonempty_folders(lfs_t *lfs, struct lfs_remove_state *p) {
...
            // Before we fetch the next block, update our fetched gstate xor
            lfs_dir_getgstate(lfs, &p->dir.m, &p->gstate);
...
}

My guess is it was failing inside qemu because of left-over data in the stack that was being xor'd (where I had made the assumption the functio filled it in). This has been fixed with the latest update.

I've also been assuming that I have to collect the global state from each of the splits within a directory for when dropping the directory. Is this assumption correct? I think the lfs_dir_drop(...) only handles the last "split" entry only.

Thanks, Tim

Jun 21 '21 16:06 tim-nordell-nimbelink

Any idea on the timeline to merge this? Thanks

Apr 06 '22 21:04 mrozekma

So, I have been sitting on this PR for a while because I've been unsure if it should go in or not. It is a really nice feature leveraging some unique properties of littlefs, and I greatly appreciate the work @tim-nordell-nimbelink has put in to get this PR into a merge-able state.

But I'm concerned with how it will interact with future improvements to littlefs. Specifically that supporting this API might restrict littlefs to the current allocator design (https://github.com/littlefs-project/littlefs/issues/75). https://github.com/littlefs-project/littlefs/issues/75 is a bit more pressing because it affects the performance of all operations in littlefs, and risks limiting the maximum possible size of storage littlefs can support. Depending on how https://github.com/littlefs-project/littlefs/issues/75 is resolved, changing the block allocator to use a bitmap for example, may or may not break this proposal.

So for now I don't think I can bring this in until https://github.com/littlefs-project/littlefs/issues/75 is resolved. This unfortunately means a much longer timeline before this PR could go in.

Apr 14 '22 05:04 geky

littlefs littlefs copied to clipboard

Feature/non empty dir removal

littlefs
littlefs copied to clipboard