html5ever icon indicating copy to clipboard operation
html5ever copied to clipboard

Segmentation fault on extremely pathological input

Open DemiMarie opened this issue 8 years ago • 6 comments

From the checkout root:

perl -We 'use strict; my $q = 40000; print "<a>"x$q, "<i>"x$q, "</a>"x$q; ' |
target/release/examples/html2html

results in Segmentation fault (core dumped).

Trying to use Valgrind results in:

==8667== Memcheck, a memory error detector
==8667== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==8667== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==8667== Command: target/release/examples/html2html
==8667== 
==8667== Conditional jump or move depends on uninitialised value(s)
==8667==    at 0x155B89: process_to_completion<alloc::rc::Rc<markup5ever::rcdom::Node>,markup5ever::rcdom::RcDom> (mod.rs:306)
==8667==    by 0x155B89: <html5ever::tree_builder::TreeBuilder<Handle, Sink> as html5ever::tokenizer::interface::TokenSink>::process_token (mod.rs:474)
==8667==    by 0x175B84: <html5ever::tokenizer::Tokenizer<Sink>>::process_token (mod.rs:233)
==8667==    by 0x157CDA: process_token_and_continue<html5ever::tree_builder::TreeBuilder<alloc::rc::Rc<markup5ever::rcdom::Node>, markup5ever::rcdom::RcDom>> (mod.rs:238)
==8667==    by 0x157CDA: emit_eof<html5ever::tree_builder::TreeBuilder<alloc::rc::Rc<markup5ever::rcdom::Node>, markup5ever::rcdom::RcDom>> (mod.rs:542)
==8667==    by 0x157CDA: eof_step<html5ever::tree_builder::TreeBuilder<alloc::rc::Rc<markup5ever::rcdom::Node>, markup5ever::rcdom::RcDom>> (mod.rs:1320)
==8667==    by 0x157CDA: end<html5ever::tree_builder::TreeBuilder<alloc::rc::Rc<markup5ever::rcdom::Node>, markup5ever::rcdom::RcDom>> (mod.rs:1285)
==8667==    by 0x157CDA: finish<markup5ever::rcdom::RcDom> (driver.rs:102)
==8667==    by 0x157CDA: <tendril::stream::Utf8LossyDecoder<Sink, A> as tendril::stream::TendrilSink<tendril::fmt::Bytes, A>>::finish (stream.rs:219)
==8667==    by 0x185742: read_from<tendril::stream::Utf8LossyDecoder<html5ever::driver::Parser<markup5ever::rcdom::RcDom>, tendril::tendril::NonAtomic>,tendril::fmt::Bytes,tendril::tendril::NonAtomic,std::io::stdio::StdinLock> (stream.rs:76)
==8667==    by 0x185742: html2html::main (html2html.rs:39)
==8667==    by 0x1AA0BC: __rust_maybe_catch_panic (lib.rs:98)
==8667==    by 0x1A38D4: try<(),closure> (panicking.rs:458)
==8667==    by 0x1A38D4: catch_unwind<closure,()> (panic.rs:361)
==8667==    by 0x1A38D4: std::rt::lang_start (rt.rs:59)
==8667==    by 0x569D4D9: (below main) (in /usr/lib64/libc-2.25.so)
==8667==  Uninitialised value was created by a stack allocation
==8667==    at 0x1538BD: <html5ever::tree_builder::TreeBuilder<Handle, Sink> as html5ever::tokenizer::interface::TokenSink>::process_token (mod.rs:417)
==8667== 
==8667== Stack overflow in thread #1: can't grow stack to 0x1ffe802000
==8667== 
==8667== Process terminating with default action of signal 6 (SIGABRT): dumping core
==8667==    at 0x56B366B: raise (in /usr/lib64/libc-2.25.so)
==8667==    by 0x56B546F: abort (in /usr/lib64/libc-2.25.so)
==8667==    by 0x1A10FA: abort_internal (mod.rs:175)
==8667==    by 0x1A10FA: std::sys_common::util::abort (util.rs:43)
==8667==    by 0x1A2166: std::sys::imp::stack_overflow::imp::signal_handler (stack_overflow.rs:112)
==8667==    by 0x52592BF: ??? (in /usr/lib64/libpthread-2.25.so)
==8667==    by 0x19B73B: get<std::io::buffered::LineWriter<std::io::stdio::Maybe<std::io::stdio::StdoutRaw>>> (cell.rs:1158)
==8667==    by 0x19B73B: try_borrow_mut<std::io::buffered::LineWriter<std::io::stdio::Maybe<std::io::stdio::StdoutRaw>>> (cell.rs:697)
==8667==    by 0x19B73B: borrow_mut<std::io::buffered::LineWriter<std::io::stdio::Maybe<std::io::stdio::StdoutRaw>>> (cell.rs:668)
==8667==    by 0x19B73B: <std::io::stdio::StdoutLock<'a> as std::io::Write>::write (stdio.rs:467)
==8667== 
==8667== HEAP SUMMARY:
==8667==     in use at exit: 32 bytes in 1 blocks
==8667==   total heap usage: 6 allocs, 5 frees, 2,000 bytes allocated
==8667== 
==8667== LEAK SUMMARY:
==8667==    definitely lost: 0 bytes in 0 blocks
==8667==    indirectly lost: 0 bytes in 0 blocks
==8667==      possibly lost: 0 bytes in 0 blocks
==8667==    still reachable: 32 bytes in 1 blocks
==8667==         suppressed: 0 bytes in 0 blocks
==8667== Rerun with --leak-check=full to see details of leaked memory
==8667== 
==8667== For counts of detected and suppressed errors, rerun with: -v
==8667== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

and building with MemorySanitizer gives:

RUSTFLAGS="-Z sanitizer=memory" cargo build --examples --release
   Compiling matches v0.1.6
   Compiling mac v0.1.1
   Compiling typed-arena v1.3.0
   Compiling log v0.3.8
   Compiling lazy_static v0.2.8
   Compiling rustc-serialize v0.3.24
   Compiling semver-parser v0.7.0
   Compiling libc v0.2.24
   Compiling unicode-xid v0.0.4
   Compiling siphasher v0.2.2
   Compiling void v1.0.2
   Compiling string_cache_shared v0.3.0
   Compiling getopts v0.2.14
   Compiling term v0.4.6
   Compiling serde v1.0.9
   Compiling precomputed-hash v0.1.0
   Compiling quote v0.3.15
   Compiling utf-8 v0.7.1
   Compiling unreachable v0.1.1
   Compiling synom v0.11.3
   Compiling phf_shared v0.7.21
   Compiling rand v0.3.15
   Compiling time v0.1.37
   Compiling debug_unreachable v0.1.1
   Compiling phf v0.7.21
   Compiling syn v0.11.11
   Compiling futf v0.1.3
   Compiling semver v0.6.0
   Compiling tendril v0.3.1
   Compiling phf_generator v0.7.21
   Compiling phf_codegen v0.7.21
   Compiling string_cache_codegen v0.4.0
   Compiling rustc_version v0.2.1
   Compiling rustc-test v0.2.0
error: failed to run custom build command for `rustc-test v0.2.0`
process didn't exit successfully: `/home/dobenour/repos/rust/html5ever/target/release/build/rustc-test-fd34d19aab7d5e2b/build-script-build` (exit code: 77)
--- stderr
==7470==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x558e7123be92 in core::option::{{impl}}::unwrap_or_else<std::ffi::os_str::OsString,closure> /checkout/src/libcore/option.rs:369
    #1 0x558e7123be92 in rustc_version::version_meta::h22208f74f9fb12b7 /home/dobenour/.cargo/registry/src/github.com-1ecc6299db9ec823/rustc_version-0.2.1/src/lib.rs:129
    #2 0x558e7123339a in build_script_build::rustc_emits_allow_fail /home/dobenour/.cargo/registry/src/github.com-1ecc6299db9ec823/rustc-test-0.2.0/build.rs:13
    #3 0x558e7123339a in build_script_build::main::hd4013b2ed62a1483 /home/dobenour/.cargo/registry/src/github.com-1ecc6299db9ec823/rustc-test-0.2.0/build.rs:7
    #4 0x558e7126e08c in __rust_maybe_catch_panic /checkout/src/libpanic_unwind/lib.rs:98
    #5 0x558e71267ac4 in std::panicking::try<(),closure> /checkout/src/libstd/panicking.rs:458
    #6 0x558e71267ac4 in std::panic::catch_unwind<closure,()> /checkout/src/libstd/panic.rs:361
    #7 0x558e71267ac4 in std::rt::lang_start::h0bb7e052899843ce /checkout/src/libstd/rt.rs:59
    #8 0x558e71233e54 in main (/home/dobenour/repos/rust/html5ever/target/release/build/rustc-test-fd34d19aab7d5e2b/build-script-build+0xce54)
    #9 0x7f606eb7b4d9 in __libc_start_main (/lib64/libc.so.6+0x204d9)
    #10 0x558e712316b9 in _start (/home/dobenour/repos/rust/html5ever/target/release/build/rustc-test-fd34d19aab7d5e2b/build-script-build+0xa6b9)

SUMMARY: MemorySanitizer: use-of-uninitialized-value /checkout/src/libcore/option.rs:369 in core::option::{{impl}}::unwrap_or_else<std::ffi::os_str::OsString,closure>
Exiting

warning: build failed, waiting for other jobs to finish...
error: build failed

DemiMarie avatar Jul 31 '17 22:07 DemiMarie

So does

perl -We 'use strict; my $q = 40000; print "<a>"x$q, "<i>"x$q, "</a>"x$q; ' |
target/release/examples/html2html

It segfaults while printing the tree to stdout.

DemiMarie avatar Jul 31 '17 22:07 DemiMarie

After I upgraded Rust, I got fatal runtime error: stack overflow.

So the problem is in the tree traversal.

DemiMarie avatar Aug 06 '17 14:08 DemiMarie

Reproducible in 0.22.2

krk avatar Apr 23 '18 20:04 krk

I think html5ever should impose a nesting limit. This would solve this and many other problems related to excessive nesting depth. Furthermore, Safari already implements such a limit.

On Mon, Apr 23, 2018, 4:41 PM Kerem [email protected] wrote:

Reproducible in 0.22.2

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/servo/html5ever/issues/290#issuecomment-383715652, or mute the thread https://github.com/notifications/unsubscribe-auth/AGGWB7iRhMuH9-VjH02uqLOuZ6Nf0ezYks5trjxqgaJpZM4OpASa .

DemiMarie avatar Apr 25 '18 21:04 DemiMarie

FWIW, the nesting limit for Blink and Firefox is now 512. See https://github.com/whatwg/html/issues/3732#issuecomment-453965126 and especially the links from the following comment.

mozfreddyb avatar Jan 14 '19 14:01 mozfreddyb