Improve 'subtree mostly changed' heuristics
See e.g. main.rs 4883edd90cc8011041a6cee0622805d6bc7847a0 hunk 5/11 or the res.append dot in e1ffa2af2a38917133a06c823fd0fc365f5eaa0d.
Old:
fn diff_file() {
let rhs_binary = true;
if rhs_binary {
print!("{}", style::header(display_path, 1, 1, "binary"));
return;
}
let extension = 1;
}
New:
fn diff_file() {
let rhs_binary = true;
if rhs_binary {
return DiffResult {
path: display_path.into(),
language: None,
binary: true,
lhs_src: "".into(),
lhs_positions: vec![],
rhs_positions: vec![],
};
}
let extension = 1;
}
errors.mli in HHVM 71abf8d56763d497e95c1e79e13724e6e103c32d is another good example. It might benefit from never splitting a LHS top-level node over multiple RHS top-level nodes. IIRC Autochrome has some ideas in this space.
Giving top-level lists a bigger discount for being wholly novel might help matters.
Deadgrep 14c7d6b74c7891ed7294abe1a6f5914948e4ab49 has an interesting example where deadgrep--directory has its body factored out.
But deadgrep bdcdf138cd71b0a5a80ca64b3bd68b7355084757 is an example where defining a new function deadgrep--escape-backslash is considered as partially reused.