Inserting OMML into Text Frame or Paragraph
Hello,
I am trying to build a pipeline to convert existing MathML to OMML and insert it into a text frame in PPT.
I came across a very useful post regarding inserting MathML into a Word doc with python-docx. It involves performing an XSL transformation on the plain MML using Office's "MML2OMML.XSL", then appending that etree object to a new paragraph. (https://github.com/python-openxml/python-docx/issues/320)
Here's an example that works for me with python-docx:
from docx import Document
from docx.shared import Inches
from lxml import etree
doc = Document()
# Convert MathML (MML) into Office MathML (OMML) using a XSLT stylesheet
tree = etree.fromstring('<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi><mo>=</mo><mstyle displaystyle="true"><mfrac><mrow><mrow><mo>−</mo><mi>b</mi></mrow><mo>±</mo><msqrt><msup><mi>b</mi><mn>2</mn></msup><mo>−</mo><mrow><mn>4</mn><mo>⁢</mo><mi>a</mi><mo>⁢</mo><mi>c</mi></mrow></msqrt></mrow><mrow><mn>2</mn><mo>⁢</mo><mi>a</mi></mrow></mfrac></mstyle></math>')
xslt = etree.parse('C:/Program Files/Microsoft Office/root/Office16/MML2OMML.XSL')
transform = etree.XSLT(xslt)
new_dom = transform(tree)
p = doc.add_paragraph()
p._element.append(new_dom.getroot())
doc.save('testDoc.docx')
I tried something similar using python-pptx. It runs without throwing any errors and creates the specified file, however the created document contains no equation.
Here's my attempt with python-pptx:
from pptx import Presentation
from pptx.util import Inches, Pt
from lxml import etree
prs = Presentation()
blank_slide_layout = prs.slide_layouts[6]
slide = prs.slides.add_slide(blank_slide_layout)
left = top = width = height = Inches(1)
txBox = slide.shapes.add_textbox(left, top, width, height)
tf = txBox.text_frame
# Convert MathML (MML) into Office MathML (OMML) using a XSLT stylesheet
tree = etree.fromstring('<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi><mo>=</mo><mstyle displaystyle="true"><mfrac><mrow><mrow><mo>−</mo><mi>b</mi></mrow><mo>±</mo><msqrt><msup><mi>b</mi><mn>2</mn></msup><mo>−</mo><mrow><mn>4</mn><mo>⁢</mo><mi>a</mi><mo>⁢</mo><mi>c</mi></mrow></msqrt></mrow><mrow><mn>2</mn><mo>⁢</mo><mi>a</mi></mrow></mfrac></mstyle></math>')
xslt = etree.parse('C:/Program Files/Microsoft Office/root/Office16/MML2OMML.XSL')
transform = etree.XSLT(xslt)
new_dom = transform(tree)
p = tf.add_paragraph()
p._element.append(new_dom.getroot())
prs.save('testDoc.pptx')
Clearly I'm doing something wrong here, but I'm not quite sure what.
I simplified my MathML to <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi></math> and manually inserted the resulting OMML into a blank PPT slide then compared the slide1.xml between an empty slide & the slide with the formula.
Here's the XML that was added to the slide when I inserted the formula:
<mc:AlternateContent xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006">
<mc:Choice xmlns:a14="http://schemas.microsoft.com/office/drawing/2010/main" Requires="a14">
<p:sp>
<p:nvSpPr>
<p:cNvPr id="2" name="Rectangle 1">
<a:extLst>
<a:ext uri="{FF2B5EF4-FFF2-40B4-BE49-F238E27FC236}">
<a16:creationId xmlns:a16="http://schemas.microsoft.com/office/drawing/2014/main" id="{63784C15-6834-4CF0-BA6B-62B8B3ACA648}" />
</a:ext>
</a:extLst>
</p:cNvPr>
<p:cNvSpPr />
<p:nvPr />
</p:nvSpPr>
<p:spPr>
<a:xfrm>
<a:off x="4388007" y="3244334" />
<a:ext cx="367985" cy="369332" />
</a:xfrm>
<a:prstGeom prst="rect">
<a:avLst />
</a:prstGeom>
</p:spPr>
<p:txBody>
<a:bodyPr wrap="none">
<a:spAutoFit />
</a:bodyPr>
<a:lstStyle />
<a:p>
<a14:m>
<m:oMathPara xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math">
<m:oMathParaPr>
<m:jc m:val="centerGroup" />
</m:oMathParaPr>
<m:oMath xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math">
<m:r>
<a:rPr lang="en-US" i="1">
<a:latin typeface="Cambria Math" panose="02040503050406030204" pitchFamily="18" charset="0" />
</a:rPr>
<m:t>?</m:t>
</m:r>
</m:oMath>
</m:oMathPara>
</a14:m>
<a:endParaRPr lang="en-US" dirty="0" />
</a:p>
</p:txBody>
</p:sp>
</mc:Choice>
<mc:Fallback>
<p:sp>
<p:nvSpPr>
<p:cNvPr id="2" name="Rectangle 1">
<a:extLst>
<a:ext uri="{FF2B5EF4-FFF2-40B4-BE49-F238E27FC236}">
<a16:creationId xmlns:a16="http://schemas.microsoft.com/office/drawing/2014/main" id="{63784C15-6834-4CF0-BA6B-62B8B3ACA648}" />
</a:ext>
</a:extLst>
</p:cNvPr>
<p:cNvSpPr>
<a:spLocks noRot="1" noChangeAspect="1" noMove="1" noResize="1" noEditPoints="1" noAdjustHandles="1" noChangeArrowheads="1" noChangeShapeType="1" noTextEdit="1" />
</p:cNvSpPr>
<p:nvPr />
</p:nvSpPr>
<p:spPr>
<a:xfrm>
<a:off x="4388007" y="3244334" />
<a:ext cx="367985" cy="369332"/>
</a:xfrm>
<a:prstGeom prst="rect">
<a:avLst />
</a:prstGeom>
<a:blipFill>
<a:blip r:embed="rId2" />
<a:stretch>
<a:fillRect />
</a:stretch>
</a:blipFill>
</p:spPr>
<p:txBody>
<a:bodyPr />
<a:lstStyle />
<a:p>
<a:r>
<a:rPr lang="en-US">
<a:noFill />
</a:rPr>
<a:t> </a:t>
</a:r>
</a:p>
</p:txBody>
</p:sp>
</mc:Fallback>
</mc:AlternateContent>
Does anyone have any clever ideas for how I might go about inserting a formula into a blank slide/text field/paragraph?
The prior step here is to add an equation to PowerPoint by hand, using the equation editor, and then examine the XML that produces (that works). It helps a lot to make the example presentation as simple as possible, so one slide with one shape. Then you can find the XML in question with:
$ opc browse my-example.pptx slide1.xml
Thank you for the support @scanny!
I was able to append the math element with the following code:
from pptx import Presentation
from pptx.util import Inches, Pt
from lxml import etree
prs = Presentation()
blank_slide_layout = prs.slide_layouts[6]
slide = prs.slides.add_slide(blank_slide_layout)
left = top = width = height = Inches(1)
txBox = slide.shapes.add_textbox(left, top, width, height)
tf = txBox.text_frame
# Convert MathML (MML) into Office MathML (OMML) using a XSLT stylesheet
tree = etree.fromstring('<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi><mo>=</mo><mstyle displaystyle="true"><mfrac><mrow><mrow><mo>−</mo><mi>b</mi></mrow><mo>±</mo><msqrt><msup><mi>b</mi><mn>2</mn></msup><mo>−</mo><mrow><mn>4</mn><mo>⁢</mo><mi>a</mi><mo>⁢</mo><mi>c</mi></mrow></msqrt></mrow><mrow><mn>2</mn><mo>⁢</mo><mi>a</mi></mrow></mfrac></mstyle></math>')
xslt = etree.parse('C:/Program Files/Microsoft Office/root/Office16/MML2OMML.XSL')
wrapper = etree.fromstring('<a14:m xmlns:a14="http://schemas.microsoft.com/office/drawing/2010/main"><m:oMathPara xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"></m:oMathPara></a14:m>')
transform = etree.XSLT(xslt)
new_dom = transform(tree)
wrapper.getchildren()[0].append(new_dom.getroot())
p = tf.add_paragraph()
p._element.append(wrapper)
prs.save('testDoc.pptx')
If there are text nodes to be inserted before/after the math content that can be accomplished like this:
textWrapOpen = '<a:r xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main"><a:t>'
textWrapClose = '</a:t></a:r>'
string = "Some string "
textTree = etree.fromstring(textWrapOpen + string + textWrapClose)
p._element.append(textTree)
I'm still working out some kinks, but this should be a good starting point for whoever might be attempting this in the future.
Hey there, this was really helpful thanks, saved me so much time. But I am facing another issue, I am importing a formula from a word then inserting it into a power point slide, its getting the work done but its inserting gibberish into the slide.
import re
import latex2mathml.converter
from pptx import Presentation
from docx import Document
from docx.shared import Inches
import docxlatex as latex
from lxml import etree
def latex_to_word(latex_input, for_ppt=False):
mathml = latex2mathml.converter.convert(latex_input, display="block")
tree = etree.fromstring(mathml)
xslt = etree.parse(
'D:\Programming\python\pptxMathVisualize\src\MML2OMML.XSL'
)
transform = etree.XSLT(xslt)
new_dom = transform(tree)
if for_ppt:
wrapper = etree.fromstring(
'''<a14:m xmlns:a14="http://schemas.microsoft.com/office/drawing/2010/main">
<m:oMath xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math">
</m:oMath>
</a14:m>''')
wrapper.getchildren()[0].append(new_dom.getroot())
return wrapper
return new_dom.getroot()
def getLatexEquations(text):
return re.findall(r'\$([^$].*?)\$', text)
document = Document('demo.docx')
latexDoc = latex.Document('demo.docx')
prs = Presentation()
title_slide_layout = prs.slide_layouts[6]
slide = prs.slides.add_slide(title_slide_layout)
left = top = width = height = Inches(1)
equations = getLatexEquations(latexDoc.get_text())
for equation in equations:
p = document.add_paragraph()
p._element.append(latex_to_word(equation))
txBoxNum = equations.index(equation) + 1.5
txBox = slide.shapes.add_textbox(left, Inches(1 * txBoxNum), width, height)
tf = txBox.text_frame
txBoxP = tf.add_paragraph()
txBoxP._element.append(latex_to_word(equation, for_ppt=True))
tf
prs.save('test.pptx')
document.save('test1.docx')
Sorry haven't had time to refactor, so let me explain the code a little, basically we have two doc variables, latexDoc is for importing the formula from the docx, the other one is for editing the docx file, and the prs variable is self-explanatory, I am importing the equations using regex to separate them from text, and then putting them in text boxes in the pptx slide.
Here are the original formula

This is the output:
