python-pptx
python-pptx copied to clipboard
Support for parsing Equations
There's #528 issue before that showed how to insert Office Math ML (Equations) text, but I want to know is there any way to parse/extract text? #706 that seemed to have handled it however it still doesn't work for all text:
for e.g. in this below extract from slide.xml, it parses "We factorise it as" under <a:r> tag but doesn't not parse "𝑥" under <a14:m> tag.
<mc:AlternateContent xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
xmlns:a14="http://schemas.microsoft.com/office/drawing/2010/main">
<mc:Choice Requires="a14">
<p:sp>
<p:nvSpPr>
<p:cNvPr id="8" name="TextBox 7" />
<p:cNvSpPr txBox="1" />
<p:nvPr />
</p:nvSpPr>
<p:spPr>
<a:xfrm>
<a:off x="1422400" y="4460458" />
<a:ext cx="4528458" cy="682046" />
</a:xfrm>
<a:prstGeom prst="rect">
<a:avLst />
</a:prstGeom>
</p:spPr>
<p:txBody>
<a:bodyPr wrap="square" lIns="0" tIns="0" rIns="0" bIns="0" rtlCol="0" anchor="t">
<a:spAutoFit />
</a:bodyPr>
<a:lstStyle />
<a:p>
<a:pPr>
<a:lnSpc>
<a:spcPts val="5725" />
</a:lnSpc>
</a:pPr>
<a:r>
<a:rPr lang="en-IN" sz="4000">
<a:solidFill>
<a:schemeClr val="bg1" />
</a:solidFill>
</a:rPr>
<a:t>We factorise it as </a:t>
</a:r>
<a14:m>
<m:oMath xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math">
<m:r>
<a:rPr lang="en-US" sz="4000" i="1" spc="-229">
<a:solidFill>
<a:srgbClr val="FFC000" />
</a:solidFill>
<a:latin typeface="Cambria Math" />
<a:ea typeface="Cambria Math" panose="02040503050406030204" pitchFamily="18"
charset="0" />
</a:rPr>
<m:t>𝑥</m:t> # Doesn't parse this
</m:r>
</m:oMath>
</a14:m>
<a:r>
<a:rPr lang="en-IN" sz="4000">
<a:solidFill>
<a:schemeClr val="bg1" />
</a:solidFill>
</a:rPr>
<a:t> =</a:t>
</a:r>
<a:endParaRPr lang="en-US" sz="4000" spc="-229" dirty="0">
<a:solidFill>
<a:schemeClr val="bg1" />
</a:solidFill>
<a:latin typeface="+mj-lt" />
</a:endParaRPr>
</a:p>
</p:txBody>
</p:sp>
</mc:Choice>
<mc:Fallback xmlns="">
<p:sp>
<p:nvSpPr>
<p:cNvPr id="8" name="TextBox 7" />
<p:cNvSpPr txBox="1">
<a:spLocks noRot="1" noChangeAspect="1" noMove="1" noResize="1" noEditPoints="1"
noAdjustHandles="1" noChangeArrowheads="1" noChangeShapeType="1" noTextEdit="1" />
</p:cNvSpPr>
<p:nvPr />
</p:nvSpPr>
<p:spPr>
<a:xfrm>
<a:off x="1422400" y="4460458" />
<a:ext cx="4528458" cy="682046" />
</a:xfrm>
<a:prstGeom prst="rect">
<a:avLst />
</a:prstGeom>
<a:blipFill>
<a:blip r:embed="rId4" />
<a:stretch>
<a:fillRect l="-6729" t="-13393" r="-1211" b="-43750" />
</a:stretch>
</a:blipFill>
</p:spPr>
<p:txBody>
<a:bodyPr />
<a:lstStyle />
<a:p>
<a:r>
<a:rPr lang="en-US">
<a:noFill />
</a:rPr>
<a:t> </a:t>
</a:r>
</a:p>
</p:txBody>
</p:sp>
</mc:Fallback>
</mc:AlternateContent>
Is there any way to extract by parsing tree?
Also see issue #947 . I'm interested in this project to extract professors' slides' text for a platform to crowdsource contributions, revision, and reviews of teaching materials in quantum information.