unstructured icon indicating copy to clipboard operation
unstructured copied to clipboard

pptx: classify slide title as Title element

Open scanny opened this issue 2 years ago • 0 comments

Context

Perhaps 99.9% of PowerPoint slides include a dedicated title shape. The only built-in slide layout that does not provide a title shape is "Blank Slide" which would typically be reserved for full-slide diagrams or images.

Situation

partition_pptx() treats the title shape the same as any other shape so title detection is based on the form of the text only. Title shape detection is simple and reliable. The fact the author placed the paragraph text in the title shape is a clear indicator of intent.

Classifying title text via shape type prevents mis-classifying or breaking the title into two or more elements when the author used line-breaks for effect.

Solution

Add a check for title shape while iterating through shapes on a slide and classify the title shape text as a Title element.

scanny avatar Sep 20 '23 19:09 scanny