Panic index out of bounds in side_by_side.rs on go.sum
While running git show --ext-diff I noticed the following at the end of the diff:
[…]
go.sum --- 1/2 --- Text
1 cloud.google.com/go v0.26.0/go.mod h1:aQUYkXzVsufM+DwF1aE+0xfcU+56JwCaLick0ClmMTw=
2 cloud.google.com/go v0.110.0 h1:Zc8gqp3+a9/Eyph2KDmcGaPtbKRIoqq4YTlL4NMD0Ys=
3 cloud.google.com/go v0.110.0/go.mod h1:SJnCLqQ0FCFGSZMUNUf84MV3Aia54kn7pi8st7tMzaY=
4 cloud.google.com/go/compute v1.19.1 h1:am86mquDUgjGNWxiGn+5PGLbmgiWXlE/yNWpIpNvuXY=
go.sum --- 2/2 --- Text
. 64 github.com/googleapis/enterprise-certificate-proxy v0.2.3/go.mod h1:AwSRAtLfXpU5Nm3pW+v7rGDHp09LsPtGY9MduiEsR9k=
. 65 github.com/googleapis/gax-go/v2 v2.8.0 h1:UBtEZqx1bjXtOQ5BVTkuYghXrr3N4V123VKJK67vJZc=
. 66 github.com/googleapis/gax-go/v2 v2.8.0/go.mod h1:4orTrqY6hXxxaUL4LHIPl6lGo8vAE38/qKbhSAKP6QI=
1 github.com/kylelemons/godebug v1.1.0/go.mod h1:9/0rRGxNHcop5bhtWyNeEfOS8JIWk580+fNqagV/RAw= 67 github.com/kylelemons/godebug v1.1.0/go.mod h1:9/0rRGxNHcop5bhtWyNeEfOS8JIWk580+fNqagV/RAw=
2 github.com/naoina/go-stringutil v0.1.0 h1:rCUeRUHjBjGTSHl0VC00jUPLz8/F9dDzYI70Hzifhks= 68 github.com/naoina/go-stringutil v0.1.0 h1:rCUeRUHjBjGTSHl0VC00jUPLz8/F9dDzYI70Hzifhks=
3 github.com/naoina/go-stringutil v0.1.0/go.mod h1:XJ2SJL9jCtBh+P9q5btrd/Ylo8XwT/h1USek5+NqSA0= 69 github.com/naoina/go-stringutil v0.1.0/go.mod h1:XJ2SJL9jCtBh+P9q5btrd/Ylo8XwT/h1USek5+NqSA0=
4 github.com/naoina/toml v0.1.1 h1:PT/lllxVVN0gzzSqSlHEmP8MJB4MY2U7STGxiouV4X8= 70 github.com/naoina/toml v0.1.1 h1:PT/lllxVVN0gzzSqSlHEmP8MJB4MY2U7STGxiouV4X8=
5 github.com/naoina/toml v0.1.1/go.mod h1:NBIhNtsFMo3G2szEBne+bO4gS192HuIYRqfvOWb4i1E= 71 github.com/naoina/toml v0.1.1/go.mod h1:NBIhNtsFMo3G2szEBne+bO4gS192HuIYRqfvOWb4i1E=
. 72 github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
. 73 github.com/prometheus/client_model v0.0.0-20190812154241-14fe0d1b01d4/go.mod h1:xMI15A0UPsDsEKsMN9yxemIoYk6Tm2C1GtY
. .. GdfGttqA=
6 github.com/streadway/amqp v0.0.0-20160311215503-2e25825abdbd h1:625/bJvSNfQrzzK5ttnUqMqnVe8M5MILmf5ZRGgeeDY= 74 github.com/streadway/amqp v0.0.0-20160311215503-2e25825abdbd h1:625/bJvSNfQrzzK5ttnUqMqnVe8M5MILmf5ZRGgeeDY=
7 github.com/streadway/amqp v0.0.0-20160311215503-2e25825abdbd/go.mod h1:1WNBiOZtZQLpVAyu0iTduoJL9hEsMloAK5XWrtW0xdY= 75 github.com/streadway/amqp v0.0.0-20160311215503-2e25825abdbd/go.mod h1:1WNBiOZtZQLpVAyu0iTduoJL9hEsMloAK5XWrtW0xdY=
thread 'main' panicked at src/display/side_by_side.rs:505:34:
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fatal: external diff died, stopping at go.sum
And sure enough, the diff output was not complete for the go.sum file and there should also be other files after it.
I reproduced the issue by running difft go.sum.old.txt go.sum.new.txt on the following two files: go.sum.old.txt and go.sum.new.txt
Here is the output with complete stacktrace:
RUST_LOG=info RUST_BACKTRACE=1 RUST_BACKTRACE=full difft go.sum.old.txt go.sum.new.txt | cat
go.sum.new.txt --- 1/2 --- Text
File permissions changed from 100664 to 100644.
1 cloud.google.com/go v0.26.0/go.mod h1:aQUYkXzVsufM+DwF1aE+0xfcU+56JwCaLick0ClmMTw=
2 cloud.google.com/go v0.110.0 h1:Zc8gqp3+a9/Eyph2KDmcGaPtbKRIoqq4YTlL4NMD0Ys=
3 cloud.google.com/go v0.110.0/go.mod h1:SJnCLqQ0FCFGSZMUNUf84MV3Aia54kn7pi8st7tMzaY=
4 cloud.google.com/go/compute v1.19.1 h1:am86mquDUgjGNWxiGn+5PGLbmgiWXlE/yNWpIpNvuXY=
go.sum.new.txt --- 2/2 --- Text
. 64 github.com/googleapis/enterprise-certificate-proxy v0.2.3/go.mod h1:AwSRAtLfXpU5Nm3pW+v7rGDHp09LsPtGY9MduiEsR9k=
. 65 github.com/googleapis/gax-go/v2 v2.8.0 h1:UBtEZqx1bjXtOQ5BVTkuYghXrr3N4V123VKJK67vJZc=
. 66 github.com/googleapis/gax-go/v2 v2.8.0/go.mod h1:4orTrqY6hXxxaUL4LHIPl6lGo8vAE38/qKbhSAKP6QI=
1 github.com/kylelemons/godebug v1.1.0/go.mod h1:9/0rRGxNHcop5bhtWyNeEfOS8JIWk580+fNqagV/RAw= 67 github.com/kylelemons/godebug v1.1.0/go.mod h1:9/0rRGxNHcop5bhtWyNeEfOS8JIWk580+fNqagV/RAw=
2 github.com/naoina/go-stringutil v0.1.0 h1:rCUeRUHjBjGTSHl0VC00jUPLz8/F9dDzYI70Hzifhks= 68 github.com/naoina/go-stringutil v0.1.0 h1:rCUeRUHjBjGTSHl0VC00jUPLz8/F9dDzYI70Hzifhks=
3 github.com/naoina/go-stringutil v0.1.0/go.mod h1:XJ2SJL9jCtBh+P9q5btrd/Ylo8XwT/h1USek5+NqSA0= 69 github.com/naoina/go-stringutil v0.1.0/go.mod h1:XJ2SJL9jCtBh+P9q5btrd/Ylo8XwT/h1USek5+NqSA0=
4 github.com/naoina/toml v0.1.1 h1:PT/lllxVVN0gzzSqSlHEmP8MJB4MY2U7STGxiouV4X8= 70 github.com/naoina/toml v0.1.1 h1:PT/lllxVVN0gzzSqSlHEmP8MJB4MY2U7STGxiouV4X8=
5 github.com/naoina/toml v0.1.1/go.mod h1:NBIhNtsFMo3G2szEBne+bO4gS192HuIYRqfvOWb4i1E= 71 github.com/naoina/toml v0.1.1/go.mod h1:NBIhNtsFMo3G2szEBne+bO4gS192HuIYRqfvOWb4i1E=
. 72 github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
. 73 github.com/prometheus/client_model v0.0.0-20190812154241-14fe0d1b01d4/go.mod h1:xMI15A0UPsDsEKsMN9yxemIoYk6Tm2C1GtY
. .. GdfGttqA=
6 github.com/streadway/amqp v0.0.0-20160311215503-2e25825abdbd h1:625/bJvSNfQrzzK5ttnUqMqnVe8M5MILmf5ZRGgeeDY= 74 github.com/streadway/amqp v0.0.0-20160311215503-2e25825abdbd h1:625/bJvSNfQrzzK5ttnUqMqnVe8M5MILmf5ZRGgeeDY=
7 github.com/streadway/amqp v0.0.0-20160311215503-2e25825abdbd/go.mod h1:1WNBiOZtZQLpVAyu0iTduoJL9hEsMloAK5XWrtW0xdY= 75 github.com/streadway/amqp v0.0.0-20160311215503-2e25825abdbd/go.mod h1:1WNBiOZtZQLpVAyu0iTduoJL9hEsMloAK5XWrtW0xdY=
thread 'main' panicked at src/display/side_by_side.rs:505:34:
index out of bounds: the len is 7 but the index is 7
stack backtrace:
0: 0x558cbfe55137 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hb2d11bf794d51a44
1: 0x558cbfc8e800 - core::fmt::write::hc9c39fd060f58be5
2: 0x558cbfe474ce - std::io::Write::write_fmt::h0f12046d62ab82d9
3: 0x558cbfe54ea4 - std::sys_common::backtrace::print::had20f5e369bd8b8b
4: 0x558cbfe32a4a - std::panicking::default_hook::{{closure}}::h5f274992636625c8
5: 0x558cbfe32709 - std::panicking::default_hook::hda96c8e22c1e561c
6: 0x558cbfe32dce - std::panicking::rust_panic_with_hook::hb9ee0580a171cf05
7: 0x558cbfe556ca - std::panicking::begin_panic_handler::{{closure}}::h6a91c522b14e319c
8: 0x558cbfe55386 - std::sys_common::backtrace::__rust_end_short_backtrace::h0c1987780043709b
9: 0x558cbfe32b20 - rust_begin_unwind
10: 0x558cbfbe7765 - core::panicking::panic_fmt::haffa8b258e06944a
11: 0x558cbfbe7972 - core::panicking::panic_bounds_check::h7ece44e9b2991f77
12: 0x558cbfcc7804 - difft::display::side_by_side::print::h395062f7207a1a24
13: 0x558cbfcebc10 - difft::print_diff_result::hf2d672fb3044ffd4
14: 0x558cbfce76d5 - difft::main::h33c7d715c916808b
15: 0x558cbfcd10c3 - std::sys_common::backtrace::__rust_begin_short_backtrace::h9f85ef8d11c91a26
16: 0x558cbfd18649 - std::rt::lang_start::{{closure}}::hedbbc4005407dc2d
17: 0x558cbfe39155 - std::rt::lang_start_internal::h18187536d7e524f6
18: 0x558cbfcebff5 - main
19: 0x7fc187c28150 - __libc_start_call_main
at ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
20: 0x7fc187c28209 - __libc_start_main_impl
at ./csu/../csu/libc-start.c:360:3
21: 0x558cbfc08d45 - _start
22: 0x0 - <unknown>
My difft version:
$ difft --version
Difftastic 0.56.1 (built with rustc 1.76.0)
By the way, great tool! :muscle:
Have similar error on the same difftastic version (using macOS 14.4.1 on M2 MacBook). My backtrace looks like:
thread 'main' panicked at src/display/side_by_side.rs:515:34:
index out of bounds: the len is 24 but the index is 24
stack backtrace:
0: 0x102c470a0 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hc44f794219052ac6
1: 0x102acea18 - core::fmt::write::h3be8b1c9593e07b4
2: 0x102c5d380 - std::io::Write::write_fmt::h95a1e6c5537a96b5
3: 0x102c46eb8 - std::sys_common::backtrace::print::h7b809dd1f18ea674
4: 0x102c5f2a8 - std::panicking::default_hook::{{closure}}::he339b0dd23f9052e
5: 0x102c5efbc - std::panicking::default_hook::he9ee42fa25c512f5
6: 0x102c5f758 - std::panicking::rust_panic_with_hook::hd4ada112570991e5
7: 0x102c47724 - std::panicking::begin_panic_handler::{{closure}}::h3f2b6a80c2b7e617
8: 0x102c472e4 - std::sys_common::backtrace::__rust_end_short_backtrace::hc0b94e2951df3fae
9: 0x102c5f4fc - _rust_begin_unwind
10: 0x103427a74 - core::panicking::panic_fmt::h0ed3af01737be6a9
11: 0x103427bcc - core::panicking::panic_bounds_check::hd8547d2ac1a831c3
12: 0x102b2e45c - difft::display::side_by_side::print::h828c049d1c24d463
13: 0x102b252f0 - difft::print_diff_result::h698bc527a35012a9
14: 0x102b21f28 - difft::main::h70547995380bd13e
15: 0x102b3ac48 - std::sys_common::backtrace::__rust_begin_short_backtrace::he6ac8584ea647aa6
16: 0x102b486c8 - std::rt::lang_start::{{closure}}::h73c9888aa910514a
17: 0x102c5f3f8 - std::panicking::try::h3128bce4d588bc41
18: 0x102c49408 - std::rt::lang_start_internal::h57a49fa80dd8b904
19: 0x102b25650 - _main
fatal: external diff died, stopping at README.md
FYI, the crash is still here with Difftastic 0.57.0 (built with rustc 1.77.1).
I've just bisected the problem to this commit: 3be8e80fe73d10bc92944eb55d822d27a380c5f8.
To trigger this issue you only need to have enough words to cross the MAX_WORDS_IN_LINE. I was consistently getting this problem with some log files, so I replaced everything with : (which counts as word) and added them to a gist.
To reproduce the issue just use those files and run difft against them:
$ curl -o bad_left 'https://gist.githubusercontent.com/JonathanxD/450595f7a25cca43076ca5083ee73e33/raw/85ac535c67fb40c30006e5622b015c387c06cbe4/bad_left'
$ curl -o bad_right 'https://gist.githubusercontent.com/JonathanxD/450595f7a25cca43076ca5083ee73e33/raw/85ac535c67fb40c30006e5622b015c387c06cbe4/bad_right'
$ difft bad_left bad_right
And I just found out that the order does not really matter (you can swap left and right and it still panics).
After further investigation, and looking around the code a bit, I found out that this line should actually be:
index d54731d59..562d327d1 100644
--- a/src/line_parser.rs
+++ b/src/line_parser.rs
@@ -152,7 +152,7 @@ pub(crate) fn change_positions(lhs_src: &str, rhs_src: &str) -> Vec<MatchedPos>
// have a very large number of words, don't diff
// individual words.
if lhs_words.len() > MAX_WORDS_IN_LINE || rhs_words.len() > MAX_WORDS_IN_LINE {
- for lhs_pos in lhs_lp.from_region(lhs_offset, lhs_offset + lhs_part.len()) {
+ for lhs_pos in lhs_lp.from_region(lhs_offset, lhs_offset + line_len_in_bytes(&lhs_part)) {
mps.push(MatchedPos {
kind: MatchKind::NovelWord {
highlight: TokenKind::Atom(AtomKind::Normal),
I ran all the tests with this patch and every seems fine. ~~The only thing that is unclear to me is why those calls use this function, but those don't: 1, 2.~~ (it's because those are word-based, not line-based).
Today I was updating difft and applying my patches and I thought: “Why not run a bisect against pretty much all of those cases again?” So, that's what I did.
Results
1st
The comment https://github.com/Wilfred/difftastic/issues/694#issuecomment-2066249883 talks about a yaml that triggers an out-of-bounds. This was the very first one that started failing during the bisect:
The bad commit is¹: ffd49d523a3ac2d3cfdec7dc463f8b20eb524c3a The last commit that it worked without panics was (which is just parent commit): f86ba13abfdc247c7963f2757917891bf05ca722 The diff: f86ba13abfdc247c7963f2757917891bf05ca722...ffd49d523a3ac2d3cfdec7dc463f8b20eb524c3a
Note that the only case that starts breaking at this point is the yaml one, and it stays that way during the entire bisect session.
2nd (already fixed)
The JQuery case (https://github.com/Wilfred/difftastic/issues/694#issue-2226773984) was broken for quite some time, but was fixed at some point:
The bad commit was: 1e7866b64ec083385a51affceb31a2571f9e3539 The last good commit: 2b9aca85f38624a31ca6f6a22285d0b710c9b70d The diff: 2b9aca85f38624a31ca6f6a22285d0b710c9b70d...1e7866b64ec083385a51affceb31a2571f9e3539
This one is not broken anymore and I didn't bisect the commit that actually fixed it.
3rd
And finally, those ones: https://github.com/Wilfred/difftastic/issues/688#issue-2212713540, https://github.com/Wilfred/difftastic/issues/681#issue-2206981128 and https://github.com/Wilfred/difftastic/issues/682#issue-2207630735.
The bad commit is: 53298e42404bd4b389e74597e6dba26518eb82b9 The last good commit: 7e8f928926cd46603d27f37bdfc38717fc52bb3d The diff: 7e8f928926cd46603d27f37bdfc38717fc52bb3d...53298e42404bd4b389e74597e6dba26518eb82b9
I hope this helps.
I think @JonathanxD is right here. I'm running into this issue as well, and running difft under cargo rr shows that I can reliably trigger the bug when lhs_words.len() > 1000 with a breakpoint at line_parser.rs:155.
Here's how the crash happens:
- When calculating
DiffResults inmain,line_parser::change_positionsis called in order to calculate the value for thelhs_positionsfield (diff_filecallsdiff_file_contentcallsline_parser::change_positions). - Because of the bug in
line_parser.rs:155,change_positionsactually calculates one extraSingleLineSpanbecause it does not correctly strip the trailing newline at the end of all files before callinglhs_lp.from_region, and therefore one extraMatchedPosgets pushed to the result inline_parser.rs:156. - When this is consumed downstream in
difft::display::side_by_side::print(passed in aslhs_mps, which comes from the constructedDiffResult), it causes an off-by-one error indisplay/side_by_side.rs:492because there is an extralhs_line_numinaligned_lines, sincealigned_linesis calculated using thelhs_mpsfrom (2), but the actual lines to print inlhs_colored_linesis calculated usinglhs_src, which does not contain an extra line.
I think the patch suggested in https://github.com/Wilfred/difftastic/issues/688#issuecomment-2036847084, and applying it does prevent the bug from occurring in my test case.
Hmm, the pull request fixes the bad_left/right case with colons, but not the go.sum.txt case.
OK, so AFAICS the underlying issue is that difftastic assumes that it can split a string on lines, join them later, and get the same string. This isn't true in .lines() in Rust's stdlib, which treats foo\nbar and foo\nbar\n the same. This used to have custom line splitting logic in difftastic, but now just uses the stdlib (changed in 8c004be87b5553809d1d865fc76adb754c2c3b55, see also cb9367c1298d253b4c2a0ec2536326abd388d043 and 8b842387a16e5ae4a79aaf6befb0bac1894882d0).
As a result, difftastic crashes when you have a large (>1000 words) change between two text files, where the change ends with a newline. There are probably other places in the code that rely on this invariant.