scraper icon indicating copy to clipboard operation
scraper copied to clipboard

The problem of creating a structure with the Select field

Open David-Valters opened this issue 1 year ago • 5 comments

Hello, I want to create my iterator that would return my structure, based on the passed html, but there is a problem when creating the iterator (fn init), please tell me how to implement it correctly and how to adjust the lifetime, I tried different ways but I don't have it turned out Here is a sample code:

struct Film {
    name: String,
}
struct FilmParser<'a> {
    html: Html,
    foo_selector: Selector,
    search_iter: scraper::html::Select<'a, 'a>,
}


impl<'a> FilmParser<'a> {
    fn init(page_body: &str)->Self{
        let html = Html::parse_document(page_body);
        let foo_selector = Selector::parse("foo").unwrap();
        Self{
            html,
            foo_selector,
            search_iter: html.select(&foo_selector)// the problem is here
        }
    }
}
impl<'a> Iterator for FilmParser<'a> {
    type Item = Film;

    fn next(&mut self) -> Option<Self::Item> {
        self.search_iter.next().map(|element| Film {
            name: element.inner_html(),
        })
    }
}

David-Valters avatar Aug 02 '24 15:08 David-Valters

This is a fundamental limitation of Rust's type system: You are trying to create a self-referential struct, in this case because search_iter references the other two fields.

This is not possible in safe Rust and you will probably need to restructure your code to store html and foo_selector separately from search_iter. There are crates to produce self-referential structs, but they are often tricky to use or have soundness bugs.

adamreichold avatar Aug 02 '24 15:08 adamreichold

Please tell me how to properly organize the code to hide the conversion of ElementRef to Film and be able to use an iterator to avoid unnecessary conversions, it would be great if the selector could be given instead of being used by reference

David-Valters avatar Aug 03 '24 13:08 David-Valters

it would be great if the selector could be given instead of being used by reference

But I think this is the crux of it, i.e. you will need to reference the selector from elsewhere if you want to wrap Select to produce an Iterator<Item=Film>.

adamreichold avatar Aug 03 '24 15:08 adamreichold

need to reference the selector

I mean that the select method could absorb a selector, for example pub fn select<'a, 'b>(&'a self, selector: Selector)

David-Valters avatar Aug 03 '24 17:08 David-Valters

need to reference the selector

I mean that the select method could absorb a selector, for example pub fn select<'a, 'b>(&'a self, selector: Selector)

In general, it's better to use references when ownership is not needed. That's why we do not take ownership of Selector here

cfvescovo avatar Aug 03 '24 18:08 cfvescovo

You should probably separate the parser state from the iterator state.

struct FilmParser {
    html: Html,
    foo_selector: Selector,
}

struct FilmIterator<'a> {
    search_iter: scraper::html::Select<'a, 'a>,
}

Like so, implementing separately

impl FilmParser { /* ... */ }
impl<'a> Iterator for FilmIterator<'a> { /* ... */ }

LoZack19 avatar Aug 26 '24 13:08 LoZack19

I will close this one, because the solutions provided work

LoZack19 avatar Aug 26 '24 13:08 LoZack19