html5ever
html5ever copied to clipboard
Segmentation fault on extremely pathological input
From the checkout root:
perl -We 'use strict; my $q = 40000; print "<a>"x$q, "<i>"x$q, "</a>"x$q; ' |
target/release/examples/html2html
results in Segmentation fault (core dumped).
Trying to use Valgrind results in:
==8667== Memcheck, a memory error detector
==8667== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==8667== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==8667== Command: target/release/examples/html2html
==8667==
==8667== Conditional jump or move depends on uninitialised value(s)
==8667== at 0x155B89: process_to_completion<alloc::rc::Rc<markup5ever::rcdom::Node>,markup5ever::rcdom::RcDom> (mod.rs:306)
==8667== by 0x155B89: <html5ever::tree_builder::TreeBuilder<Handle, Sink> as html5ever::tokenizer::interface::TokenSink>::process_token (mod.rs:474)
==8667== by 0x175B84: <html5ever::tokenizer::Tokenizer<Sink>>::process_token (mod.rs:233)
==8667== by 0x157CDA: process_token_and_continue<html5ever::tree_builder::TreeBuilder<alloc::rc::Rc<markup5ever::rcdom::Node>, markup5ever::rcdom::RcDom>> (mod.rs:238)
==8667== by 0x157CDA: emit_eof<html5ever::tree_builder::TreeBuilder<alloc::rc::Rc<markup5ever::rcdom::Node>, markup5ever::rcdom::RcDom>> (mod.rs:542)
==8667== by 0x157CDA: eof_step<html5ever::tree_builder::TreeBuilder<alloc::rc::Rc<markup5ever::rcdom::Node>, markup5ever::rcdom::RcDom>> (mod.rs:1320)
==8667== by 0x157CDA: end<html5ever::tree_builder::TreeBuilder<alloc::rc::Rc<markup5ever::rcdom::Node>, markup5ever::rcdom::RcDom>> (mod.rs:1285)
==8667== by 0x157CDA: finish<markup5ever::rcdom::RcDom> (driver.rs:102)
==8667== by 0x157CDA: <tendril::stream::Utf8LossyDecoder<Sink, A> as tendril::stream::TendrilSink<tendril::fmt::Bytes, A>>::finish (stream.rs:219)
==8667== by 0x185742: read_from<tendril::stream::Utf8LossyDecoder<html5ever::driver::Parser<markup5ever::rcdom::RcDom>, tendril::tendril::NonAtomic>,tendril::fmt::Bytes,tendril::tendril::NonAtomic,std::io::stdio::StdinLock> (stream.rs:76)
==8667== by 0x185742: html2html::main (html2html.rs:39)
==8667== by 0x1AA0BC: __rust_maybe_catch_panic (lib.rs:98)
==8667== by 0x1A38D4: try<(),closure> (panicking.rs:458)
==8667== by 0x1A38D4: catch_unwind<closure,()> (panic.rs:361)
==8667== by 0x1A38D4: std::rt::lang_start (rt.rs:59)
==8667== by 0x569D4D9: (below main) (in /usr/lib64/libc-2.25.so)
==8667== Uninitialised value was created by a stack allocation
==8667== at 0x1538BD: <html5ever::tree_builder::TreeBuilder<Handle, Sink> as html5ever::tokenizer::interface::TokenSink>::process_token (mod.rs:417)
==8667==
==8667== Stack overflow in thread #1: can't grow stack to 0x1ffe802000
==8667==
==8667== Process terminating with default action of signal 6 (SIGABRT): dumping core
==8667== at 0x56B366B: raise (in /usr/lib64/libc-2.25.so)
==8667== by 0x56B546F: abort (in /usr/lib64/libc-2.25.so)
==8667== by 0x1A10FA: abort_internal (mod.rs:175)
==8667== by 0x1A10FA: std::sys_common::util::abort (util.rs:43)
==8667== by 0x1A2166: std::sys::imp::stack_overflow::imp::signal_handler (stack_overflow.rs:112)
==8667== by 0x52592BF: ??? (in /usr/lib64/libpthread-2.25.so)
==8667== by 0x19B73B: get<std::io::buffered::LineWriter<std::io::stdio::Maybe<std::io::stdio::StdoutRaw>>> (cell.rs:1158)
==8667== by 0x19B73B: try_borrow_mut<std::io::buffered::LineWriter<std::io::stdio::Maybe<std::io::stdio::StdoutRaw>>> (cell.rs:697)
==8667== by 0x19B73B: borrow_mut<std::io::buffered::LineWriter<std::io::stdio::Maybe<std::io::stdio::StdoutRaw>>> (cell.rs:668)
==8667== by 0x19B73B: <std::io::stdio::StdoutLock<'a> as std::io::Write>::write (stdio.rs:467)
==8667==
==8667== HEAP SUMMARY:
==8667== in use at exit: 32 bytes in 1 blocks
==8667== total heap usage: 6 allocs, 5 frees, 2,000 bytes allocated
==8667==
==8667== LEAK SUMMARY:
==8667== definitely lost: 0 bytes in 0 blocks
==8667== indirectly lost: 0 bytes in 0 blocks
==8667== possibly lost: 0 bytes in 0 blocks
==8667== still reachable: 32 bytes in 1 blocks
==8667== suppressed: 0 bytes in 0 blocks
==8667== Rerun with --leak-check=full to see details of leaked memory
==8667==
==8667== For counts of detected and suppressed errors, rerun with: -v
==8667== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
and building with MemorySanitizer gives:
RUSTFLAGS="-Z sanitizer=memory" cargo build --examples --release
Compiling matches v0.1.6
Compiling mac v0.1.1
Compiling typed-arena v1.3.0
Compiling log v0.3.8
Compiling lazy_static v0.2.8
Compiling rustc-serialize v0.3.24
Compiling semver-parser v0.7.0
Compiling libc v0.2.24
Compiling unicode-xid v0.0.4
Compiling siphasher v0.2.2
Compiling void v1.0.2
Compiling string_cache_shared v0.3.0
Compiling getopts v0.2.14
Compiling term v0.4.6
Compiling serde v1.0.9
Compiling precomputed-hash v0.1.0
Compiling quote v0.3.15
Compiling utf-8 v0.7.1
Compiling unreachable v0.1.1
Compiling synom v0.11.3
Compiling phf_shared v0.7.21
Compiling rand v0.3.15
Compiling time v0.1.37
Compiling debug_unreachable v0.1.1
Compiling phf v0.7.21
Compiling syn v0.11.11
Compiling futf v0.1.3
Compiling semver v0.6.0
Compiling tendril v0.3.1
Compiling phf_generator v0.7.21
Compiling phf_codegen v0.7.21
Compiling string_cache_codegen v0.4.0
Compiling rustc_version v0.2.1
Compiling rustc-test v0.2.0
error: failed to run custom build command for `rustc-test v0.2.0`
process didn't exit successfully: `/home/dobenour/repos/rust/html5ever/target/release/build/rustc-test-fd34d19aab7d5e2b/build-script-build` (exit code: 77)
--- stderr
==7470==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x558e7123be92 in core::option::{{impl}}::unwrap_or_else<std::ffi::os_str::OsString,closure> /checkout/src/libcore/option.rs:369
#1 0x558e7123be92 in rustc_version::version_meta::h22208f74f9fb12b7 /home/dobenour/.cargo/registry/src/github.com-1ecc6299db9ec823/rustc_version-0.2.1/src/lib.rs:129
#2 0x558e7123339a in build_script_build::rustc_emits_allow_fail /home/dobenour/.cargo/registry/src/github.com-1ecc6299db9ec823/rustc-test-0.2.0/build.rs:13
#3 0x558e7123339a in build_script_build::main::hd4013b2ed62a1483 /home/dobenour/.cargo/registry/src/github.com-1ecc6299db9ec823/rustc-test-0.2.0/build.rs:7
#4 0x558e7126e08c in __rust_maybe_catch_panic /checkout/src/libpanic_unwind/lib.rs:98
#5 0x558e71267ac4 in std::panicking::try<(),closure> /checkout/src/libstd/panicking.rs:458
#6 0x558e71267ac4 in std::panic::catch_unwind<closure,()> /checkout/src/libstd/panic.rs:361
#7 0x558e71267ac4 in std::rt::lang_start::h0bb7e052899843ce /checkout/src/libstd/rt.rs:59
#8 0x558e71233e54 in main (/home/dobenour/repos/rust/html5ever/target/release/build/rustc-test-fd34d19aab7d5e2b/build-script-build+0xce54)
#9 0x7f606eb7b4d9 in __libc_start_main (/lib64/libc.so.6+0x204d9)
#10 0x558e712316b9 in _start (/home/dobenour/repos/rust/html5ever/target/release/build/rustc-test-fd34d19aab7d5e2b/build-script-build+0xa6b9)
SUMMARY: MemorySanitizer: use-of-uninitialized-value /checkout/src/libcore/option.rs:369 in core::option::{{impl}}::unwrap_or_else<std::ffi::os_str::OsString,closure>
Exiting
warning: build failed, waiting for other jobs to finish...
error: build failed
So does
perl -We 'use strict; my $q = 40000; print "<a>"x$q, "<i>"x$q, "</a>"x$q; ' |
target/release/examples/html2html
It segfaults while printing the tree to stdout.
After I upgraded Rust, I got fatal runtime error: stack overflow.
So the problem is in the tree traversal.
Reproducible in 0.22.2
I think html5ever should impose a nesting limit. This would solve this and many other problems related to excessive nesting depth. Furthermore, Safari already implements such a limit.
On Mon, Apr 23, 2018, 4:41 PM Kerem [email protected] wrote:
Reproducible in 0.22.2
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/servo/html5ever/issues/290#issuecomment-383715652, or mute the thread https://github.com/notifications/unsubscribe-auth/AGGWB7iRhMuH9-VjH02uqLOuZ6Nf0ezYks5trjxqgaJpZM4OpASa .
FWIW, the nesting limit for Blink and Firefox is now 512. See https://github.com/whatwg/html/issues/3732#issuecomment-453965126 and especially the links from the following comment.