rust-libxml icon indicating copy to clipboard operation
rust-libxml copied to clipboard

XPath issues when doc goes out of scope

Open jangernert opened this issue 4 years ago • 5 comments

On my way debugging and separating the issues listed in #60 I came across another thing that isn't related to async or threading.

An xpath context is non-functional after the document from which it was created goes out of scope and is dropped. There is no warning or error. It just doesn't return any results any more.

Here is a small example. The parse_html function causes the xpath evaluation to return an empty vector, but no unwrap fails. Doing the same thing in the unit test itself (the commented out code) works as expected:

#[cfg(test)]
mod tests {
    use std::fs::File;
    use std::io::Read;
    use libxml::parser::Parser;
    use libxml::xpath::Context;
    use libxml::tree::Node;

    #[test]
    fn test1() {
        let mut file = File::open("html.txt").unwrap();
        let mut html = String::new();
        file.read_to_string(&mut html).unwrap();

        // let parser = Parser::default_html();
        // let doc = parser.parse_string(html).unwrap();
        // let ctx = Context::new(&doc).unwrap();
        
        let ctx = parse_html(&html);
        
        let res = evaluate_xpath(&ctx, "//article/header");
        assert_eq!(res.len(), 1);
    }

    fn parse_html(html: &str) -> Context {
        let parser = Parser::default_html();
        let doc = parser.parse_string(html).unwrap();
        Context::new(&doc).unwrap()
    }

    fn evaluate_xpath(ctx: &Context, xpath: &str) -> Vec<Node> {
        let res = ctx.evaluate(xpath).unwrap();
        res.get_nodes_as_vec()
    }
}

I see that the XPath context does contain a weak reference to the document to prevent exactly this issue. But from my understanding the context needs a strong reference to extend the lifetime of the document.

Weak documentation:

Since a Weak reference does not count towards ownership, it will not prevent the inner value from being dropped, and Weak itself makes no guarantees about the value still being present and may return None when upgraded.

~~Not sure if I understand everything correctly, but the Document implementing drop directly and freeing the doc pointer seems wrong to me. Shouldn't the Document just drop its reference to the _Document and only when all references to _Document are dropped _Document itself calls xmlFreeDoc~~ -> oops, misread the code. It works exactly as I just explained. Guess it's a bit confusing that Document is declared in L82 and right after that the drop trait is implemented for _Document which is declared before Document.

jangernert avatar Nov 18 '19 19:11 jangernert