rust-html2text icon indicating copy to clipboard operation
rust-html2text copied to clipboard

Panic: Slice index out of range

Open mdierksen opened this issue 3 months ago • 4 comments

Got this panic with version 0.16.2

thread 'tokio-runtime-worker' panicked at /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/html2text-0.16.2/src/render/text_renderer.rs:1948:33:
range end index 18446744073709551615 out of range for slice of length 0

with this call

/// Extract text from HTML.
fn extract_text(html: &str) -> Result<String, &str> {
    let config = config::with_decorator(TrivialDecorator::new());

    config
        .string_from_read(html.as_bytes(), usize::MAX)
        .map_err(|_| "Cannot parse text")
}

This error comes from a stripped release version running in production, so I will not be able to provide further details.

mdierksen avatar Nov 17 '25 10:11 mdierksen

Thank you for the report! Hopefully the panic message will be enough to work out the problem and find a test case (but might not be immediate).

jugglerchris avatar Nov 17 '25 22:11 jugglerchris

The same panic still happens with 0.16.4.

Maybe I am missing something, but looking at the code in append_columns_with_borders function in text_renderer.rs the problem seems to be obvious:

    fn append_columns_with_borders<I>(&mut self, cols: I, collapse: bool) -> Result<()>
    where
        I: IntoIterator<Item = (Self, usize)>,
        Self: Sized,
    {
        use self::TaggedLineElement::Str;
        html_trace!("append_columns_with_borders(collapse={})", collapse);
        html_trace!("self=<<<\n{}>>>", self.to_string());

        self.flush_wrapping()?;

        let mut tot_width = 0;

        let mut line_sets = cols
            .into_iter()
            .map(|(sub_r, rowspan)| {
                let width = sub_r.width;
                let pos = tot_width;
                tot_width += width + 1;
                html_trace!("Adding column:\n{}", sub_r.to_string());
                Ok(LineSet {
                    pos,
                    width,
                    rowspan,
                    lines: sub_r
                        .into_lines()?
                        .into_iter()
                        .map(|mut line| {
                            match line {
                                RenderLine::Text(ref mut tline) => {
                                    tline.pad_to(width, &self.ann_stack);
                                }
                                RenderLine::Line(ref mut border) => {
                                    border.stretch_to(width);
                                }
                            }
                            line
                        })
                        .collect(),
                })
            })
            .collect::<Result<Vec<LineSet<_>>>>()?;

        {
            // Character position from line sets.
            let mut lidx = 0;
            let mut lnextpos = 0;
            for ls in std::mem::take(&mut self.overhang_cells) {
                while lidx < line_sets.len() && line_sets[lidx].pos < ls.pos {
                    let lpos = line_sets[lidx].pos;
                    lnextpos = lpos + line_sets[lidx].width + 1;
                    lidx += 1;
                }
                if lidx >= line_sets.len() {
                    // This row doesn't extend this far.
                    if lnextpos < ls.pos {
                        // We need padding
                        line_sets.push(LineSet {
                            pos: lnextpos,
                            width: ls.pos.saturating_sub(lnextpos + 1),
                            rowspan: 1,
                            lines: Default::default(),
                        });
                    }
                    if ls.pos + ls.width > tot_width {
                        // +1 because we subtract one later
                        tot_width = ls.pos + ls.width + 1;
                    }
                    line_sets.push(ls);
                } else {
                    //debug_assert_eq!(ls.width, line_sets[lidx].width);
                    line_sets[lidx] = ls;
                }
            }
        }

        // If we haven't added any columns, tot_width will be 0; otherwise
        // it will be one too high for a final unneeded border.
        tot_width = tot_width.saturating_sub(1);

        let mut next_border = BorderHoriz::new(tot_width, self.ann_stack.clone());

        // Join the vertical lines to all the borders
        if let Some(RenderLine::Line(prev_border)) = self.lines.back_mut() {
            let mut pos = 0;
            html_trace!("Merging with last line:\n{}", prev_border.to_string());
            for ls in &line_sets[..line_sets.len() - 1] {                           <============ PANIC here
                let w = ls.width;
                html_trace!("pos={}, w={}", pos, w);
                prev_border.join_below(pos + w);
                next_border.join_above(pos + w);
                pos += w + 1;
            }
            if let Some(ls) = line_sets.last() {
                // Stretch the previous border if this row is wider.
                prev_border.extend_to(pos + ls.width);
            }
        }

I would guess that line_sets has a length of zero, so line_sets.len() - 1 causes an integer underflow and the value wraps around to usize::MAX.

Added a check for line_sets being zero should solve the problem.

I am far from a Rust expert, but generally I always avoid unchecked subtracting from unsigned integers but rather use the checked implementations.

mdierksen avatar Dec 01 '25 12:12 mdierksen

diff --git a/src/render/text_renderer.rs b/src/render/text_renderer.rs
index e8209e2..e3fcd09 100644
--- a/src/render/text_renderer.rs
+++ b/src/render/text_renderer.rs
@@ -1945,12 +1945,14 @@ impl<D: TextDecorator> Renderer for SubRenderer<D> {
         if let Some(RenderLine::Line(prev_border)) = self.lines.back_mut() {
             let mut pos = 0;
             html_trace!("Merging with last line:\n{}", prev_border.to_string());
-            for ls in &line_sets[..line_sets.len() - 1] {
-                let w = ls.width;
-                html_trace!("pos={}, w={}", pos, w);
-                prev_border.join_below(pos + w);
-                next_border.join_above(pos + w);
-                pos += w + 1;
+            if line_sets.len() > 0 {
+                for ls in &line_sets[..line_sets.len() - 1] {
+                    let w = ls.width;
+                    html_trace!("pos={}, w={}", pos, w);
+                    prev_border.join_below(pos + w);
+                    next_border.join_above(pos + w);
+                    pos += w + 1;
+                }
             }
             if let Some(ls) = line_sets.last() {
                 // Stretch the previous border if this row is wider.

I tried it and that fixes the panic. No clue whether it needs further adjustments for fixing functionality.

mdierksen avatar Dec 01 '25 16:12 mdierksen

I agree that should stop the panic. It might be the right fix, but I haven't had time to look into whether it makes sense for line_sets to be empty at that point. With some fuzzing I have generated an input which reproduces the panic, though.

jugglerchris avatar Dec 01 '25 22:12 jugglerchris

Fix released in 0.16.5 (at least with the example I found by fuzzing). Thanks again for the report!

jugglerchris avatar Dec 06 '25 07:12 jugglerchris

Great. Thank you for all your hard work.

mdierksen avatar Dec 08 '25 13:12 mdierksen