Stylesheets icon indicating copy to clipboard operation
Stylesheets copied to clipboard

.docx to TEI P5 XML Document conversion fails

Open fricke-steyer opened this issue 6 years ago • 6 comments

Can you help me? Our other files are ok, only this one doesn't work. Whats wrong? Kind regards, Henrike

emotion_analysis_2019.docx

Error occured. Please check the filetype and try again.?

Error: class pl.psnc.dl.ege.exception.ConverterException

Processing terminated by xsl:message at line 130 in fields.xsl

fricke-steyer avatar Nov 19 '19 11:11 fricke-steyer

I did a little debugging and the error I get (from running on the command line) is

 fldSimple: unrecognized type REF BMfig_wheel \* MERGEFORMAT 

This originates from the word file here:

<w:fldSimple w:instr="REF BMfig_wheel \* MERGEFORMAT ">
    <w:r w:rsidRPr="005B4B5A">
        <w:rPr>
            <w:rStyle w:val="AbbVerweiszfdgZchn"/>
        </w:rPr>
        <w:t>1</w:t>
    </w:r>
</w:fldSimple>

-- which is the "1" reference in "The wheel (Figure 1) is constructed …"

I'm no docx expert, so I do not know which (arcane) feature this is and how to treat it right. Hence, I'd like to close it here and move it to the Stylesheets issues if anyone thinks we should follow up on this?!

peterstadler avatar Nov 19 '19 13:11 peterstadler

Running this online with docxtotei produces the (slightly) more helpful error message:

 [xslt] fldSimple: unrecognized type REF BMfig_wheel * MERGEFORMAT

which appears to relate to the reference to a graphic in section 2.2 :

"The wheel (Figure 1) is constructed in the fashion of a color wheel"

I don't have Word here, so I cannot be sure. However, if I delete that parenthesized reference, save the file as DOCX, and try the conversion again, everything works fine.

Maybe the problem is that the graphic file isn't included in the document?

On 11/12/2019 15:00, fricke-steyer wrote:

Can you help me? Our other files are ok, only this one doesn't work. Whats wrong? Kind regards, Henrike

emotion_analysis_2019.docx https://github.com/TEIC/oxgarage/files/3863581/emotion_analysis_2019.docx

Error occured. Please check the filetype and try again.?

Error: class pl.psnc.dl.ege.exception.ConverterException

Processing terminated by xsl:message at line 130 in fields.xsl

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/TEIC/Stylesheets/issues/405?email_source=notifications&email_token=AAFBJ5HW4A3Y7KHTRFWIOWDQYD57TA5CNFSM4JZQLPS2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4H7ZHFUA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFBJ5E6VSSECMBXVBDHIJLQYD57TANCNFSM4JZQLPSQ.

lb42 avatar Dec 11 '19 16:12 lb42

Rather than opening a new issue, I post here another Word file that causes the Stylesheets to fail. At first glance it looks easier to fix than the previous one, the error is:

A sequence of more than one item is not allowed as the first argument of fn:starts-with() ("VAROVALKE_1_brez ozadja copy", "VAROVALKE_2_brez ozadja copy") ; SystemID: file:/project/tei/convert/Stylesheets/docx/from/graphics.xsl; Line#: 83; Column#: 12

TEI_Stylesheet_crash-test.docx

TomazErjavec avatar Oct 13 '20 10:10 TomazErjavec

Council F2F group looked at this and the problem is actually a pointer to something that does not exist in the Word document itself. We (me and @martinascholger and @joeytakeda) think that in fields.xsl, at line 129, we should not terminate the processing but instead output a <hi> element with an error flag in the @rend attribute and then apply-templates to provide some helpful content.

martindholmes avatar Sep 15 '25 08:09 martindholmes

Re this issue, @TomazErjavec:

Rather than opening a new issue, I post here another Word file that causes the Stylesheets to fail. At first glance it looks easier to fix than the previous one, the error is:

A sequence of more than one item is not allowed as the first argument of fn:starts-with() ("VAROVALKE_1_brez ozadja copy", "VAROVALKE_2_brez ozadja copy") ; SystemID: file:/project/tei/convert/Stylesheets/docx/from/graphics.xsl; Line#: 83; Column#: 12

TEI_Stylesheet_crash-test.docx

If this is still an issue, could you please open a new issue for the error?

joeytakeda avatar Sep 15 '25 08:09 joeytakeda

@TomazErjavec similarly to the other issue, one workaround would be to use TEI Publisher's conversion, attaching the results

TEI_Stylesheet_crash-test.docx.xml

Image

@fricke-steyer for the document you posted, Publisher's conversion is also successful

emotion_analysis_2019.docx.xml

tuurma avatar Sep 15 '25 10:09 tuurma