lol-html icon indicating copy to clipboard operation
lol-html copied to clipboard

Acting after on_end_tag

Open mitsuhiko opened this issue 2 years ago • 4 comments

I'm not sure if this is a feature request but I have tried using on_end_tag to do something after a tag has been closed. Unfortunately the handler is invoked before the tag is being written into the sink. This is intentional clearly as this lets the handler modify things like the tag name or append stuff behind the tag, but it also means that you cannot communicate into the sink easily.

My idea was to instruct the sink to output or not output content outside of an element of interest (eg: to "select" a certain element exclusively). I am thus flipping a flag on enter/leave. The result now is that my closing tag is no longer emitted.

I believe there are use cases where one wants to have code run after the tag has been closed and emitted tot he sink and I'm not sure if this is at all possible at the moment.

mitsuhiko avatar Nov 28 '21 20:11 mitsuhiko

I see you've proposed #109 to add this capability.

Curious as to why this didn't seem to work using the existing on_end_tag, I had a go at getting it to work. This is my code to display only the a tags in a document, including the start and end tags. I assume that you've got something similar to the can_write flag in your code. Adding the extra string was the only additional step needed. It might be considered slightly hacky that I construct the end tag string manually, but end tags are pretty simple.

I don't object to your proposed change. I just wanted to understand why it was a problem. Have I understood it correctly, or have I missed an aspect of the problem you're describing?

use lol_html::{element, HtmlRewriter, Settings};
use std::{cell::RefCell, error::Error, rc::Rc};

const PAGE: &str = "
<html>
This <a href=\"http://example.com\">link</a> is an example.
</html>
";

struct OutputHandler {
    can_write: bool,
    extra: String,
}
impl OutputHandler {
    fn on(&mut self) {
        self.can_write = true;
    }
    fn off(&mut self) {
        self.can_write = false;
    }
    fn push(&mut self, extra: &str) {
        self.extra.push_str(extra)
    }
}

fn main() -> Result<(), Box<dyn Error>> {
    let output = Rc::new(RefCell::new(OutputHandler {
        can_write: false,
        extra: String::new(),
    }));
    let element_content_handlers = vec![element!("a", |a| {
        output.borrow_mut().on();
        let output = output.clone();
        a.on_end_tag(move |tag| {
            let mut handler = output.borrow_mut();
            handler.push(&format!("</{}>", tag.name()));
            handler.off();
            Ok(())
        })?;
        Ok(())
    })];

    let output = output.clone();
    let mut rewriter = HtmlRewriter::new(
        Settings {
            element_content_handlers,
            ..Settings::default()
        },
        |chunk: &[u8]| {
            let mut handler = output.borrow_mut();
            if !handler.extra.is_empty() {
                print!("{}", handler.extra);
                handler.extra.clear();
            }
            if handler.can_write {
                print!("{}", String::from_utf8_lossy(chunk))
            }
        },
    );
    rewriter.write(PAGE.as_ref())?;
    rewriter.end()?;
    Ok(())
}

jongiddy avatar Dec 16 '21 21:12 jongiddy

You're right in that it can be somewhat emulated but it's quite inconvenient. This solution now also always inserts a closing tag, even if that did not exist in the original document. For me the biggest issue was actually that I attempted to maintain a somewhat accurate tag stack to make more meaningful decisions and having the on_end fire "within" the stack level creates a lot of complexities.

However right now all of this is entirely blocked on #110 anyways. A solution to that might change the situation somewhat.

mitsuhiko avatar Dec 27 '21 18:12 mitsuhiko

Would this be easier if, rather than having the on_end_tag method on the Element, there was a separate end tag handler like there is a separate element text handler? That was my original idea, but the on_end_tag callback was easier to implement.

jongiddy avatar Dec 28 '21 06:12 jongiddy

Potentially. The current nice aspect of this on_end_tag business is that you can pass state from the start tag to the end tag somehow, but with the need to maintain a stack anyways that might not be necessary.

mitsuhiko avatar Dec 29 '21 19:12 mitsuhiko