Panic: Slice index out of range
Got this panic with version 0.16.2
thread 'tokio-runtime-worker' panicked at /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/html2text-0.16.2/src/render/text_renderer.rs:1948:33:
range end index 18446744073709551615 out of range for slice of length 0
with this call
/// Extract text from HTML.
fn extract_text(html: &str) -> Result<String, &str> {
let config = config::with_decorator(TrivialDecorator::new());
config
.string_from_read(html.as_bytes(), usize::MAX)
.map_err(|_| "Cannot parse text")
}
This error comes from a stripped release version running in production, so I will not be able to provide further details.
Thank you for the report! Hopefully the panic message will be enough to work out the problem and find a test case (but might not be immediate).
The same panic still happens with 0.16.4.
Maybe I am missing something, but looking at the code in append_columns_with_borders function in text_renderer.rs the problem seems to be obvious:
fn append_columns_with_borders<I>(&mut self, cols: I, collapse: bool) -> Result<()>
where
I: IntoIterator<Item = (Self, usize)>,
Self: Sized,
{
use self::TaggedLineElement::Str;
html_trace!("append_columns_with_borders(collapse={})", collapse);
html_trace!("self=<<<\n{}>>>", self.to_string());
self.flush_wrapping()?;
let mut tot_width = 0;
let mut line_sets = cols
.into_iter()
.map(|(sub_r, rowspan)| {
let width = sub_r.width;
let pos = tot_width;
tot_width += width + 1;
html_trace!("Adding column:\n{}", sub_r.to_string());
Ok(LineSet {
pos,
width,
rowspan,
lines: sub_r
.into_lines()?
.into_iter()
.map(|mut line| {
match line {
RenderLine::Text(ref mut tline) => {
tline.pad_to(width, &self.ann_stack);
}
RenderLine::Line(ref mut border) => {
border.stretch_to(width);
}
}
line
})
.collect(),
})
})
.collect::<Result<Vec<LineSet<_>>>>()?;
{
// Character position from line sets.
let mut lidx = 0;
let mut lnextpos = 0;
for ls in std::mem::take(&mut self.overhang_cells) {
while lidx < line_sets.len() && line_sets[lidx].pos < ls.pos {
let lpos = line_sets[lidx].pos;
lnextpos = lpos + line_sets[lidx].width + 1;
lidx += 1;
}
if lidx >= line_sets.len() {
// This row doesn't extend this far.
if lnextpos < ls.pos {
// We need padding
line_sets.push(LineSet {
pos: lnextpos,
width: ls.pos.saturating_sub(lnextpos + 1),
rowspan: 1,
lines: Default::default(),
});
}
if ls.pos + ls.width > tot_width {
// +1 because we subtract one later
tot_width = ls.pos + ls.width + 1;
}
line_sets.push(ls);
} else {
//debug_assert_eq!(ls.width, line_sets[lidx].width);
line_sets[lidx] = ls;
}
}
}
// If we haven't added any columns, tot_width will be 0; otherwise
// it will be one too high for a final unneeded border.
tot_width = tot_width.saturating_sub(1);
let mut next_border = BorderHoriz::new(tot_width, self.ann_stack.clone());
// Join the vertical lines to all the borders
if let Some(RenderLine::Line(prev_border)) = self.lines.back_mut() {
let mut pos = 0;
html_trace!("Merging with last line:\n{}", prev_border.to_string());
for ls in &line_sets[..line_sets.len() - 1] { <============ PANIC here
let w = ls.width;
html_trace!("pos={}, w={}", pos, w);
prev_border.join_below(pos + w);
next_border.join_above(pos + w);
pos += w + 1;
}
if let Some(ls) = line_sets.last() {
// Stretch the previous border if this row is wider.
prev_border.extend_to(pos + ls.width);
}
}
I would guess that line_sets has a length of zero, so line_sets.len() - 1 causes an integer underflow and the value wraps around to usize::MAX.
Added a check for line_sets being zero should solve the problem.
I am far from a Rust expert, but generally I always avoid unchecked subtracting from unsigned integers but rather use the checked implementations.
diff --git a/src/render/text_renderer.rs b/src/render/text_renderer.rs
index e8209e2..e3fcd09 100644
--- a/src/render/text_renderer.rs
+++ b/src/render/text_renderer.rs
@@ -1945,12 +1945,14 @@ impl<D: TextDecorator> Renderer for SubRenderer<D> {
if let Some(RenderLine::Line(prev_border)) = self.lines.back_mut() {
let mut pos = 0;
html_trace!("Merging with last line:\n{}", prev_border.to_string());
- for ls in &line_sets[..line_sets.len() - 1] {
- let w = ls.width;
- html_trace!("pos={}, w={}", pos, w);
- prev_border.join_below(pos + w);
- next_border.join_above(pos + w);
- pos += w + 1;
+ if line_sets.len() > 0 {
+ for ls in &line_sets[..line_sets.len() - 1] {
+ let w = ls.width;
+ html_trace!("pos={}, w={}", pos, w);
+ prev_border.join_below(pos + w);
+ next_border.join_above(pos + w);
+ pos += w + 1;
+ }
}
if let Some(ls) = line_sets.last() {
// Stretch the previous border if this row is wider.
I tried it and that fixes the panic. No clue whether it needs further adjustments for fixing functionality.
I agree that should stop the panic. It might be the right fix, but I haven't had time to look into whether it makes sense for line_sets to be empty at that point.
With some fuzzing I have generated an input which reproduces the panic, though.
Fix released in 0.16.5 (at least with the example I found by fuzzing). Thanks again for the report!
Great. Thank you for all your hard work.