msoffice-pptx-rs icon indicating copy to clipboard operation
msoffice-pptx-rs copied to clipboard

Extracting text from all slides

Open nleroy917 opened this issue 9 months ago • 7 comments

Hello! I know you are not actively maintaining this crate anymore, and that's ok, but I was trying to use it to extract some text from PowerPoint decks. I'm really interested in a pure-rust implementation for speed + portability (trying to just compile to WASM to do it on the browser).

Anyways... I got pretty close, but I am stuck trying to match on TextRun enums. Here is my code:


    fn extract(data: &[u8]) -> Result<String, anyhow::Error> {
        // create temp file to read from
        let mut file = NamedTempFile::new()?;
        file.write_all(data)?;

        // read pptx file
        let path = file.into_temp_path();
        let pptx = PPTXDocument::from_file(&path).unwrap();

        // start with empty text
        let mut text = String::new();

        // iterate over slides
        for (_, slide) in &pptx.slide_map {
            for shape in slide.common_slide_data.shape_tree.shape_array.iter() {
                match shape {
                    msoffice_pptx::pml::ShapeGroup::Shape(s) => {
                        match &s.text_body {
                            Some(text) => {
                                for paragraph in text.paragraph_array.iter() {
                                    for text_run in paragraph.text_run_list.iter() {
                                        //
                                        // I am stuck here.. can't import the proper enum to match on
                                        //
                                        match text_run {
                                            _ => ()
                                        }
                                    }
                                }
                            },
                            None => ()
                        }
                    },
                    msoffice_pptx::pml::ShapeGroup::GroupShape(_) => todo!(),
                    msoffice_pptx::pml::ShapeGroup::GraphicFrame(_) => (),
                    msoffice_pptx::pml::ShapeGroup::Connector(_) => (),
                    msoffice_pptx::pml::ShapeGroup::Picture(_) => (),
                    msoffice_pptx::pml::ShapeGroup::ContentPart(_) => (),
                }
            }
          }
        Ok("".to_string())
    }
   
    ```

It seems like the proper enums are inside `msoffice_shared`... but I can't import them.

Any help is appreciated!!!

nleroy917 avatar May 13 '24 23:05 nleroy917