mynewt-nffs icon indicating copy to clipboard operation
mynewt-nffs copied to clipboard

NFFS can store a valid file-system instance which is unrecoverable after reset.

Open nvlsianpu opened this issue 6 years ago • 18 comments

It is possible that during regular filesystem operation filesystem footprint is shaped such that during subsequent NFFS restore operation maximum number of block will be exceeded in peek. This is caused by difference in algorithm which should lead to expected (the same) NFFS runtime variables state.

Proposition for resolution of that is to rework NFFS restore in such manner that will be more sequential process than is now:

  • 1st restore inodes to RAM
  • 2nd restore valid block to RAM, thanks to that inodes state are known, the maximum block number shouldn’t be exceeded.

It also looks like both scans should be perform by traversing oldest to newest area. Descending solution should be even more effective, but it is not supported by nffs objects on the flash.

nvlsianpu avatar Sep 24 '18 15:09 nvlsianpu

@carlescufi @andrzej-kaczmarek Any comments on that. I will try to resolve the issue soon, is there anyone worth to be added in the loop for discussions?

nvlsianpu avatar Sep 24 '18 15:09 nvlsianpu

In consequence at some point of execution call rc = nffs_restore_object(&disk_object); returns FS_ENOMEM error code, which is ignored.

nvlsianpu avatar Sep 25 '18 09:09 nvlsianpu

@andrzej-kaczmarek @sjanc can you please help us make this known upstream in Mynewt to see if someone there can help @nvlsianpu with the analysis and fix?

carlescufi avatar Sep 25 '18 09:09 carlescufi

@ccollins476ad could you take a look at this?

carlescufi avatar Sep 25 '18 09:09 carlescufi

Thanks for reporting this, @nvlsianpu. I agree with the analysis, and I think the proposed solution is a good one.

EDIT: I keep convincing myself that there is still the possibility of failure, even if inodes are processed before data blocks. I will check back after I refamiliarize myself with NFFS!

ccollins476ad avatar Sep 27 '18 17:09 ccollins476ad

Just to follow up on my last comment - I do think a problem remains, even if we fully process inodes before data blocks. Specifically, the inode pool can become exhausted during file system restoration.

The below is not the greatest explanation in the world. Please don't hesitate to ask if something is not clear.

Unfortunately, when we discover an inode deletion record, we cannot immediately discard the deleted inode and its children. Instead, we need to remember that these items are deleted. If we were to discard deleted inodes during the restore process, that would introduce an ambiguity for subsequently discovered children: is the parent deleted, or has it just not been discovered yet? Currently, we are allowed to assume that the parent is not discovered yet, because we maintain records of deleted inodes.

I have an idea for how to solve this. I think the core issue here is that the contents of an NFFS disk are not in chronological order. For example, say a disk contains 1) a parent directory P, 2) a child file C, and 3) a deletion record for P. NFFS allows these objects to be be located on disk in any order relative to one another. If we had some guarantee that children always appear after their parent, and that deletion records are after the inodes they delete, then the restore process would not need to remember which inodes were deleted. Instead, NFFS could assume that an orphan's parent has been deleted, and could safely discard the orphan.

Currently, NFFS writes an object to any disk area that can accommodate it. This is what introduces the uncertainty in the ordering of objects. Instead, NFFS could write to areas in a sequential fashion. If the object being written does not fit in the current area, NFFS would select the next area as its new current area. Nothing new would get written to the first area until it is garbage collected, even if it could accommodate a subsequent write.

ccollins476ad avatar Sep 27 '18 21:09 ccollins476ad

Thanks for elaboration - you are right in your analyze.

What you propose will increase wear levering, especially if object are big in comparison to area size.

We can consider usage of additional RAM temporary, just when initializing:

inode size on disk: 20 B + file_name/directory_name data_block size on disk: 24 B + data nffs_area header on disk: 3 B

File deletion: the worst case: the smallest file create and immediate delete: [data_block of 1 B data] + [inode of 1 B file_name] + [deletion inode] = 25 + 21 + 20 = 66 B

For each deleted inode only ID and sequence number must be known during file image parsing -> 8 B

each the worst case required 8 B for store required deletion info.

The worst case is that whole partition is filled by deleted files, no directory at all, regular area size:

deleted_inode_info_max = ((Partition_size - area_size - areas_count * 3) / 66 ) * 8. So ratio is around 1/8 [RAM/ROM]

For NFFS which contain 2 areas of size 4096 required 62 item of deleted inode info -> 496 B required. I think this is much tu much. And will be even worse if considers directory-create-delete scenario.

But I have also idea of change management of IDs:

nvlsianpu avatar Oct 01 '18 12:10 nvlsianpu

Lest assume that NFFS will try to avoid make holes in inode ID range - this mean that it will use only smal ranges of ID, count of the maximum number of each kind of inode. This mean that we need to remember (ID, Sequence number) pair of deleted inode.

struct nffs_deleted_inode {
    uint32_t ndi_id;            /* deleted object ID. */
    uint32_t ndi_seq;           /* Sequence number. */
};

Whenever NFFS attempt to create a new inode it will take spare ID form deletion list (uses it with increment sequence number obviously), and just in case the list was empty it will get brand new ID. For current NFFS we need to create a list of deleted inode ID-Seq pairs with capacity of FS_NFFS_NUM_INODES * 2 (as there are FILE and DIRECTORY inode kind).

Such a patch will ensure that we will always fit in memory constrains when restoring NFFS, but RAM consumption per inode will increase.

nvlsianpu avatar Oct 01 '18 12:10 nvlsianpu

@nvlsianpu, I think I understand your proposed solution, and I believe it would indeed solve this problem. However, I think there may be another issue that the solution does not address (this issue exists in the current implementation as well).

Key

Symbol Meaning
DEL x Deletion record for inode x
fx(p=y) File inode with ID x; parent directory has ID y
bx(p=y) Data block with ID x; parent file has ID y

Say we have the following NFFS layout:

Area 0 Area 1 Scratch Area 3
f100(p=0) b3(p=100) ... b5(p=200)
b1(p=100) b4(p=100) ... b6(p=200)
b2(p=100) f200(p=0) ... DEL 100

In other words, we have two files, 100 and 200. Each file contains several data blocks. Finally, file 100 has recently been deleted.

When garbage collection occurs, the disk will now look like this:

Area 0 Area 1 Area 2 Scratch
f100(p=0) b3(p=100) b5(p=200) ...
b1(p=100) b4(p=100) b6(p=200) ...
b2(p=100) f200(p=0) ...

That is, the deletion record gets erased.

If the system reboots, file 100 now looks valid again; its deletion record was lost.

I might not be thinking hard enough, but I still believe we need some chronological ordering of records within the disk.

ccollins476ad avatar Oct 02 '18 22:10 ccollins476ad

So deleted inode record are not GC properly as i understand - I think if we apply my proposition of manage ID ranges then we will be able to keep deletion inode entry RAM representation, It might be extension to previously proposed struct nffs_deleted_inode.

nvlsianpu avatar Oct 03 '18 09:10 nvlsianpu

@nvlsianpu , perhaps I just don't understand your proposed solution. During garbage collection, how would NFFS know to preserve the deletion record?

ccollins476ad avatar Oct 03 '18 14:10 ccollins476ad

so struct nffs_deleted_inode need to know where overridden inode is - so need to add a reference to it. This reference should be null when referenced inode is deleted.

nvlsianpu avatar Oct 04 '18 09:10 nvlsianpu

@ccollins476ad - I just looked at the nffs_gc() code https://github.com/apache/mynewt-nffs/blob/master/src/nffs_gc.c#L498. It looks like block which don't have parent inode in the `from_area' won't be copied. Am I right, or i missed something.

nvlsianpu avatar Oct 04 '18 10:10 nvlsianpu

@nvlsianpu, I don't believe that is the case. The code applies nffs_gc_inode_blocks() to every inode in the system, regardless of which area they reside in.

ccollins476ad avatar Oct 04 '18 18:10 ccollins476ad

@ccollins476ad You are right - I miss the iteration range. Sorry for false alarm. What do you think about:

so struct nffs_deleted_inode need to know where overridden inode is - so need to add a reference to it. This reference should be null when referenced inode is deleted.

?

nvlsianpu avatar Oct 04 '18 19:10 nvlsianpu

Hallo, we are currently using NFFS in our project. We also recognized problems during the file system init phase, if the same file was written multiple times. To resolve this issue do you think that there are changes of the metadata stored in flash required? (concerning migration of NFFS for a firmware update)

MitterdorferMathias avatar Oct 17 '18 08:10 MitterdorferMathias

@MitterdorferMathias As you can read actually there are 2 bug-fix way. In both metadata of data and inode object are kept the same. It is possible that metadata of the area object will change. But for botch solution NFFS will be unable to parse previous FS flash representation.

nvlsianpu avatar Oct 17 '18 10:10 nvlsianpu

@ccollins476ad @utzig Any update?

nvlsianpu avatar Feb 19 '19 09:02 nvlsianpu