lol-html
lol-html copied to clipboard
Acting after on_end_tag
I'm not sure if this is a feature request but I have tried using on_end_tag
to do something after a tag has been closed. Unfortunately the handler is invoked before the tag is being written into the sink. This is intentional clearly as this lets the handler modify things like the tag name or append stuff behind the tag, but it also means that you cannot communicate into the sink easily.
My idea was to instruct the sink to output or not output content outside of an element of interest (eg: to "select" a certain element exclusively). I am thus flipping a flag on enter/leave. The result now is that my closing tag is no longer emitted.
I believe there are use cases where one wants to have code run after the tag has been closed and emitted tot he sink and I'm not sure if this is at all possible at the moment.
I see you've proposed #109 to add this capability.
Curious as to why this didn't seem to work using the existing on_end_tag
, I had a go at getting it to work. This is my code to display only the a
tags in a document, including the start and end tags. I assume that you've got something similar to the can_write
flag in your code. Adding the extra
string was the only additional step needed. It might be considered slightly hacky that I construct the end tag string manually, but end tags are pretty simple.
I don't object to your proposed change. I just wanted to understand why it was a problem. Have I understood it correctly, or have I missed an aspect of the problem you're describing?
use lol_html::{element, HtmlRewriter, Settings};
use std::{cell::RefCell, error::Error, rc::Rc};
const PAGE: &str = "
<html>
This <a href=\"http://example.com\">link</a> is an example.
</html>
";
struct OutputHandler {
can_write: bool,
extra: String,
}
impl OutputHandler {
fn on(&mut self) {
self.can_write = true;
}
fn off(&mut self) {
self.can_write = false;
}
fn push(&mut self, extra: &str) {
self.extra.push_str(extra)
}
}
fn main() -> Result<(), Box<dyn Error>> {
let output = Rc::new(RefCell::new(OutputHandler {
can_write: false,
extra: String::new(),
}));
let element_content_handlers = vec![element!("a", |a| {
output.borrow_mut().on();
let output = output.clone();
a.on_end_tag(move |tag| {
let mut handler = output.borrow_mut();
handler.push(&format!("</{}>", tag.name()));
handler.off();
Ok(())
})?;
Ok(())
})];
let output = output.clone();
let mut rewriter = HtmlRewriter::new(
Settings {
element_content_handlers,
..Settings::default()
},
|chunk: &[u8]| {
let mut handler = output.borrow_mut();
if !handler.extra.is_empty() {
print!("{}", handler.extra);
handler.extra.clear();
}
if handler.can_write {
print!("{}", String::from_utf8_lossy(chunk))
}
},
);
rewriter.write(PAGE.as_ref())?;
rewriter.end()?;
Ok(())
}
You're right in that it can be somewhat emulated but it's quite inconvenient. This solution now also always inserts a closing tag, even if that did not exist in the original document. For me the biggest issue was actually that I attempted to maintain a somewhat accurate tag stack to make more meaningful decisions and having the on_end
fire "within" the stack level creates a lot of complexities.
However right now all of this is entirely blocked on #110 anyways. A solution to that might change the situation somewhat.
Would this be easier if, rather than having the on_end_tag
method on the Element
, there was a separate end tag handler like there is a separate element text handler? That was my original idea, but the on_end_tag
callback was easier to implement.
Potentially. The current nice aspect of this on_end_tag
business is that you can pass state from the start tag to the end tag somehow, but with the need to maintain a stack anyways that might not be necessary.