thin-provisioning-tools icon indicating copy to clipboard operation
thin-provisioning-tools copied to clipboard

More strictly checking in thin_check ?

Open mingnus opened this issue 9 years ago • 7 comments

Hi Joe,

Currently, thin_check checks mapping tree and detail tree separately, which cannot check the compatibility of each other. For example, incompatible device IDs, or the data mapping root points to a wrong subtree (I had encountered once on an old kernel).

I suggest to do some more checking in quick mode (with --skip-mappings):

  1. Check the compatibility of keys in data mapping tree and device details tree
  2. For each thin device, check the highest key in data mapping tree >= device_details::mapped_blocks
  3. Check the reference count of btree roots and bitmap roots

And do this in extended mode (no --skip-mappings):

  1. For each thin device, check the number of keys in data mapping tree == device_details::mapped_blocks

How do you think?

mingnus avatar May 17 '16 11:05 mingnus

Sounds good

On Tue, 17 May 2016 12:16 Ming-Hung Tsai, [email protected] wrote:

Hi Joe,

Currently, thin_check checks mapping tree and detail tree separately, which cannot check the compatibility of each other. For example, incompatible device IDs, or the data mapping root points to a wrong subtree (I had encountered once on an old kernel).

I suggest to do some more checking in quick mode (with --skip-mappings):

  1. Check the compatibility of keys in data mapping tree and device details tree
  2. For each thin device, check the highest key in data mapping tree >= device_details::mapped_blocks
  3. Check the reference count of btree roots and bitmap roots

And do this in extended mode (no --skip-mappings):

  1. For each thin device, check the number of keys in data mapping tree == device_details::mapped_blocks

How do you think?

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/jthornber/thin-provisioning-tools/issues/62

jthornber avatar May 17 '16 12:05 jthornber

Would you like to do it recently? If not, I could help on it these weeks.

mingnus avatar May 17 '16 13:05 mingnus

I'm on vacation at the moment so won't get chance to look at it for a few weeks

On Tue, 17 May 2016 14:09 Ming-Hung Tsai, [email protected] wrote:

Would you like to do it recently? If not, I could help on it these weeks.

— You are receiving this because you commented.

Reply to this email directly or view it on GitHub https://github.com/jthornber/thin-provisioning-tools/issues/62#issuecomment-219712279

jthornber avatar May 17 '16 14:05 jthornber

Some ideas about getting the total number of mapped blocks of the bottom level trees:

method1. (The simplest way) Use walk_mapping_tree() to walk every nodes of all the bottom level trees, including the shared nodes (i.e., set btree_damage_visitor::avoid_repeated_visits_ to false). Then, count the mapped blocks in a customized mapping_visitor. However, due to the visiting shared nodes, we'll spend more time in walking the data mapping tree.

method2. Write a special btree_damage_visitor to count the number of mapped blocks of each subtree, without repeated visiting. This approach increases the code complexity and consumes more memory (maybe 32MB or even more, because I need to track the number of child-entries for each btree node)

Which one do you prefer?

mingnus avatar May 18 '16 17:05 mingnus

Reply to myself: Finally I choose method1 for its simplicity. I wrote another checker class for the mapping tree, which works like class mapping_tree_emitter: traverse the bottom-level trees individually, then check the number of mapped blocks and the highest mapped block after traversal.

The code is almost completed. Now some problems arise:

1 . index_store::count_metadata() and index_entry_counter::visit() don't validate the beneath bitmap blocks.

I wish to check the metadata space map (including all the bitmap blocks) in "thin_check --skip-mappings" because it's mandatory. However, both index_store::count_metadata() and index_entry_counter::visit() don't validate the beneath bitmap blocks. This is not an issue without --skip-mappings, because check_space_map_counts() reads reference counts of all the metadata blocks, thus all the bitmap blocks of metadata space map will be checked. In contrast, my new function check_space_map_counts_partial() won't do this:

// This is a simplified version of check_space_map_counts(),
// which only checks the ref_counts of blocks recorded by block_counter
error_state check_space_map_counts_partial() {
    block_counter bc;
    count_metadata_partial(tm, sb, bc); // count the blocks used as btree roots and bitmap blocks
    block_counter::count_map::const_iterator it = bc.get_counts().begin();
    for (; it != bc.get_counts().end(); ++it) {
        ref_t c_actual = metadata_sm->get_count(it->first);
        ref_t c_expected = it->second;

        if (c_actual != c_expected) {
            // raise error
        }
    }
}

Is there any concern to validate the bitmap inside index_store::count_metadata() and index_entry_counter::visit() ?

2 . counting_visitor should have the ability to pass errors to thin_check, to report errors in space map btree nodes. (In contrast, data space map bitmap errors could be reported by ValueCounter.)

There are some solutions:

(1) Add a function to block_counter for counting broken blocks

(2) Apply the chain-of-responsibility pattern on btree::visitor. For example, btree_damage_visitor, counting_visitor, and btree_value_extractor form a chain. The btree_damage_visitor has the responsibility to report errors, then passes good nodes to the succeeding counting_visitor. Finally, the btree_value_extractor passes the mapped values to ValueVisitor. The benefit is, we can do damage visiting and reference counting in a single traversal, but it requires a large code base change.

mingnus avatar May 25 '16 16:05 mingnus

Is this feature complete and merge to main branch?

sallyjunjun avatar Mar 29 '22 07:03 sallyjunjun

Not yet. The ongoing Rust version has a significant performance improvement, which solves the trade-off between the execution time and precision.

mingnus avatar Mar 29 '22 09:03 mingnus