data-diff
data-diff copied to clipboard
Make data-diff faster when there are lots of differences
Today, one of the caveats of data-diff is that it's going to be significantly slower if you have a lot of differences, because we'll be checksumming so many segments repeatedly as we try to find the columns. I'm not exactly sure what the best solution is, but it likely entails a threshold of differences in earlier segments that cause us to increase the --bisection-threshold