docx-rs icon indicating copy to clipboard operation
docx-rs copied to clipboard

Ability to get the last rendered page of a paragraph/element

Open Czechh opened this issue 1 year ago • 2 comments

Is your feature request related to a problem? Please describe.

When reading a docx file, it's really useful to understand where a paragraph is located within a document to create experiences around moving the renderer to that point and generate references and quotes that come from a docx document.

Describe the solution you'd like

Since the page number is really something that is part of the render engine of the docx file, I do believe that editors like MS Word, inserts <w:lastRenderedPageBreak/> break points (more info). So adding using this XML element to infer the page while constructing the document and adding that value to each Paragraph and Table should suffice.

Something like:

impl FromXML for Document {
    fn from_xml<R: Read>(reader: R) -> Result<Self, ReaderError> {
        let mut parser = EventReader::new(reader);
        let mut last_rendered_page_index = 0;
        let mut doc = Self::default();
        loop {
            let e = parser.next();
            match e {
                Ok(XmlEvent::StartElement {
                    attributes, name, ..
                }) => {
                    let e = XMLElement::from_str(&name.local_name).unwrap();
                    match e {
                        XMLElement::Paragraph => {
                            let mut p = Paragraph::read(&mut parser, &attributes)?;
                            p = p.last_rendered_page_break_number(last_rendered_page_index);
                            doc = doc.add_paragraph(p);
                            continue;
                        }
                        ...
                        XMLElement::LastRenderedPageBreak => {
                            last_rendered_page_index += 1;

                            continue;
                        }
                        _ => {}
                        ...

Describe alternatives you've considered

I have considered getting the estimates of the element sizes, and doing a rough calculation of that possible page number. But, this might be more buggy and hacky than the other alternative.

Additional context

I'm happy to work on this, if the author agrees!

Czechh avatar Dec 21 '23 15:12 Czechh

@Czechh Thanks for your proposal. Also, thanks for sponsoring. I am interested, may I ask you to try to make a PR?

bokuweb avatar Dec 24 '23 02:12 bokuweb

Of course! I'll get a pr going! Thank you for the response.

Czechh avatar Dec 26 '23 02:12 Czechh