monolith https://win98icons.alexmeub.com/ -o test.html
Running monolith with in the following way:
RUST_BACKTRACE=full monolith https://win98icons.alexmeub.com/ -o test.html
gives me the following backtrace:
thread 'main' panicked at /home/zzzz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tendril-0.4.3/src/buf32.rs:86:59:
tendril: overflow in buffer arithmetic
stack backtrace:
0: 0x55aea2f557cb - std::backtrace_rs::backtrace::libunwind::trace::h3926e05c1d1f3b6d
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/../../backtrace/src/backtrace/libunwind.rs:104:5
1: 0x55aea2f557cb - std::backtrace_rs::backtrace::trace_unsynchronized::h9f5691494ac25ae6
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
2: 0x55aea2f557cb - std::sys_common::backtrace::_print_fmt::h7e6bb7b81bf214f4
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/sys_common/backtrace.rs:67:5
3: 0x55aea2f557cb - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hcf688c88e28c91b4
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/sys_common/backtrace.rs:44:22
4: 0x55aea2f8cf70 - core::fmt::rt::Argument::fmt::h59a542682908b618
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/core/src/fmt/rt.rs:142:9
5: 0x55aea2f8cf70 - core::fmt::write::hce91e70849a27dee
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/core/src/fmt/mod.rs:1120:17
6: 0x55aea2f4a18d - std::io::Write::write_fmt::h0bba58d3b1b495e9
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/io/mod.rs:1762:15
7: 0x55aea2f555b4 - std::sys_common::backtrace::_print::hf3a4f110a22f16df
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/sys_common/backtrace.rs:47:5
8: 0x55aea2f555b4 - std::sys_common::backtrace::print::h0450d1fd5fc83f73
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/sys_common/backtrace.rs:34:9
9: 0x55aea2f74a9a - std::panicking::default_hook::{{closure}}::hee7ec73fab21a529
10: 0x55aea2f7473d - std::panicking::default_hook::he65be6b11b67d1e4
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/panicking.rs:292:9
11: 0x55aea2f74eb8 - std::panicking::rust_panic_with_hook::h9e4f07a5a69c9caf
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/panicking.rs:779:13
12: 0x55aea2f55bae - std::panicking::begin_panic_handler::{{closure}}::h69a9732dd2e7007d
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/panicking.rs:657:13
13: 0x55aea2f559e6 - std::sys_common::backtrace::__rust_end_short_backtrace::hf159dc40d4738bc4
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/sys_common/backtrace.rs:170:18
14: 0x55aea2f74be2 - rust_begin_unwind
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/panicking.rs:645:5
15: 0x55aea2c7fb55 - core::panicking::panic_fmt::hf38ef33e65607e17
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/core/src/panicking.rs:72:14
16: 0x55aea2c80213 - core::panicking::panic_display::h695159eb72fa602b
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/core/src/panicking.rs:178:5
17: 0x55aea2c80213 - core::panicking::panic_str::h839a1b401ad563bd
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/core/src/panicking.rs:152:5
18: 0x55aea2c80213 - core::option::expect_failed::hcea3d24ddc96ad3d
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/core/src/option.rs:1985:5
19: 0x55aea2c8e429 - tendril::tendril::Tendril<F,A>::push_bytes_without_validating::h50e12f85f7f35d1a
20: 0x55aea2c9370b - monolith::html::set_node_attr::heb9ad581297b57be
21: 0x55aea2c961d1 - monolith::html::retrieve_and_embed_asset::h6bae4bc6359cc639
22: 0x55aea2c9970c - monolith::html::walk_and_embed_assets::hd91638de90a0c14b
23: 0x55aea2c99455 - monolith::html::walk_and_embed_assets::hd91638de90a0c14b
24: 0x55aea2c99455 - monolith::html::walk_and_embed_assets::hd91638de90a0c14b
25: 0x55aea2c96f99 - monolith::html::walk_and_embed_assets::hd91638de90a0c14b
26: 0x55aea2c8301e - monolith::main::h5d9fab2f19629c63
27: 0x55aea2c8abb3 - std::sys_common::backtrace::__rust_begin_short_backtrace::hb984754bdbd206ae
28: 0x55aea2c8aec9 - std::rt::lang_start::{{closure}}::h7a809d95489ba334
29: 0x55aea2f74ad4 - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::h79a1db7ef67e1194
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/core/src/ops/function.rs:284:13
30: 0x55aea2f74ad4 - std::panicking::try::do_call::h4d3f550f9f5d649e
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/panicking.rs:552:40
31: 0x55aea2f74ad4 - std::panicking::try::h01b4e15c9906c472
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/panicking.rs:516:19
32: 0x55aea2f74ad4 - std::panic::catch_unwind::h8f18404de61a31a9
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/panic.rs:142:14
33: 0x55aea2f74ad4 - std::rt::lang_start_internal::{{closure}}::h9b2491841fb26cd6
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/rt.rs:148:48
34: 0x55aea2f74ad4 - std::panicking::try::do_call::ha5fc70c953975daf
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/panicking.rs:552:40
35: 0x55aea2f74ad4 - std::panicking::try::hf2c81cc356fa2f99
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/panicking.rs:516:19
36: 0x55aea2f60b2b - std::panic::catch_unwind::h6741cb988acde652
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/panic.rs:142:14
37: 0x55aea2f60b2b - std::rt::lang_start_internal::h65f0529f246fb3ed
at /build/rustc-60UC9b/rustc-1.75.0+dfsg0ubuntu1~bpo0/library/std/src/rt.rs:148:20
38: 0x55aea2c8aebe - std::rt::lang_start::ha14658b2162455fd
39: 0x7fe038c08083 - __libc_start_main
at /build/glibc-wuryBv/glibc-2.31/csu/../csu/libc-start.c:308:16
40: 0x55aea2c8042e - _start
41: 0x0 - <unknown>
e
Let me know if you need more information!
What an interesting find!
So, it looks like this file https://win98icons.alexmeub.com/win-icons.min.css refers this file https://win98icons.alexmeub.com/css_sprites.png more than 1000 times, means Monolith will try to embed the same file over 1000 times, and it makes the process run out of RAM (or just upsets it so much that it gives up and quits). Even if it did manage to save it as one file, using a browser to open it later would likely crash it, it'd be like a 100MB file at least.
I'd say, let's try that page again when I add MHTML support, then it'll save it just once and reference multiple times, exactly like on the web.
Found a similar issue! (many refs to a image by css) Nice to see it'll be fixed once MHTML support is there 😊
Very cool project!
Yes, very nice project indeed! Now you mentioned MHTML, I cannot get around the conspiracy in my head that the name Monolith is somehow inspired by it :-)
Oh and I see you just added cargo to the dependencies when installing from source. I also needed pkg-config otherwise OpenSSL couldn't be found:
--- stderr
thread 'main' panicked at /home/evvr/.cargo/registry/src/index.crates.io-6f17d22bba15001f/openssl-sys-0.9.77/build/find_normal.rs:191:5:
Could not find directory of OpenSSL installation, and this `-sys` crate cannot
proceed without this knowledge. If OpenSSL is installed and this crate had
trouble finding it, you can set the `OPENSSL_DIR` environment variable for the
compilation process.
Make sure you also have the development packages of openssl installed.
For example, `libssl-dev` on Ubuntu or `openssl-devel` on Fedora.
If you're in a situation where you think the directory *should* be found
automatically, please open a bug at https://github.com/sfackler/rust-openssl
and include information about your system as well as this message.
$HOST = x86_64-unknown-linux-gnu
$TARGET = x86_64-unknown-linux-gnu
openssl-sys = 0.9.77
It looks like you're compiling on Linux and also targeting Linux. Currently this
requires the `pkg-config` utility to find OpenSSL but unfortunately `pkg-config`
could not be found. If you have OpenSSL installed you can likely fix this by
installing `pkg-config`.
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
warning: build failed, waiting for other jobs to finish...
error: failed to compile `monolith v2.8.1 (/home/evvr/temp/monolith)`, intermediate artifacts can be found at `/home/evvr/temp/monolith/target`.
To reuse those artifacts with a future compilation, set the environment variable `CARGO_TARGET_DIR` to that path.
make: *** [Makefile:20: install] Error 101
that makes dependency number 3 😄
It kind of just happened, with MHTML sounding similar. The name for the project (Monolith) was fitting, since it describes the result (one file, with everything in it), and also happens to contain all the letters of HTML in it. But with MHTML, it even ends up fitting "Monolithic HTML" well, pure luck.
pkg-config is usually pretty standard on every system, just like make and everything else... I should probably mention something like "build-tools", thank you for the tip!
Can't find a way to easily use some sort of on-disk buffering instead of RAM in Rust, besides that would probably slow the program down.
I'm considering simply optimizing caching and cleaning up unneeded data more promptly. For example, instead of holding every retrieved asset in RAM in that global cache object, it'll keep it on disk via tempfile, and only references to those on-disk files will be used. Should probably add an ENV flag to disable caching completely, just in case.
What's challenging here seems to be dedup assets that's essentially identical in the final HTML file. Since all assets (mainly images) are going to be base64 encoded in the same file, there is no way to reference the same asset multiple times. Only if HTML allows url() to reference attribute of HTML element... (but attr() is still experimental)
Other projects suffer with this problem as well. SingleFile cannot save this page either.
Holding all retrieved assets in RAM is not a problem as far as they are de-duped by their original URI (e.g. url() value). For example, if there is a global cache table with asset URI as lookup key and base64 data URI as value, each time monolith finds an url() it extracts the URI and do a lookup in the global cache table. If it's not there then do a fetch and base64 encode it to fill the cache entry. After all cache entries are generated monolith can do a stream write to a temp file, during which when an asset URI is encountered, it fills values found in the cache table. If the stream write finishes then move it to final file. The cache table does not need to be on disk.
Other projects suffer with this problem as well. SingleFile cannot save this page either.
For the record, SingleFile proposes to solve this kind of issue with the "self-extracting" file format. With this format, the saved page weighs around 1Mb, see the page in the attached zip file.
I think that'd be a bit too much for monolith, overengineering.
The best way to solve a problem is to blame it on someone else and not solve it at all.
Having a giant file to serve as a collection of icons is just bad engineering decision from the standpoint of letting the thing load, let alone archiving it. Working around badly-made and unoptimized websites would take way too much time and effort, I rather fix some bugs and implement a couple more required features for monolith instead.