msoffice-pptx-rs
msoffice-pptx-rs copied to clipboard
Extracting text from all slides
Hello! I know you are not actively maintaining this crate anymore, and that's ok, but I was trying to use it to extract some text from PowerPoint decks. I'm really interested in a pure-rust implementation for speed + portability (trying to just compile to WASM to do it on the browser).
Anyways... I got pretty close, but I am stuck trying to match on TextRun
enums. Here is my code:
fn extract(data: &[u8]) -> Result<String, anyhow::Error> {
// create temp file to read from
let mut file = NamedTempFile::new()?;
file.write_all(data)?;
// read pptx file
let path = file.into_temp_path();
let pptx = PPTXDocument::from_file(&path).unwrap();
// start with empty text
let mut text = String::new();
// iterate over slides
for (_, slide) in &pptx.slide_map {
for shape in slide.common_slide_data.shape_tree.shape_array.iter() {
match shape {
msoffice_pptx::pml::ShapeGroup::Shape(s) => {
match &s.text_body {
Some(text) => {
for paragraph in text.paragraph_array.iter() {
for text_run in paragraph.text_run_list.iter() {
//
// I am stuck here.. can't import the proper enum to match on
//
match text_run {
_ => ()
}
}
}
},
None => ()
}
},
msoffice_pptx::pml::ShapeGroup::GroupShape(_) => todo!(),
msoffice_pptx::pml::ShapeGroup::GraphicFrame(_) => (),
msoffice_pptx::pml::ShapeGroup::Connector(_) => (),
msoffice_pptx::pml::ShapeGroup::Picture(_) => (),
msoffice_pptx::pml::ShapeGroup::ContentPart(_) => (),
}
}
}
Ok("".to_string())
}
```
It seems like the proper enums are inside `msoffice_shared`... but I can't import them.
Any help is appreciated!!!