markdown-transform
markdown-transform copied to clipboard
CiceroMark<->OOXML transformers
Feature Request 🛍️
Incorporating the CiceroMark->OOXML transformer and improving the currently implemented OOXML->CiceroMark transformer.
Use Case
It will allow the conversion of Docx files into CiceroMark JSON and vice versa. In addition, using this Docx files can be converted into different other formats like PDF or HTML as well.
Possible Solution
The transformer for OOXML->CiceroMark is already implemented. It needs to be updated with some entities to allow full transformation.
The transformer for CiceroMark->OOXML is created in cicero-word-add-in branch. It needs to be transferred/transported here with some changes to allow the transformation.
Detailed Description
Currently, the CiceroMark->OOXML transformer supports the following conversions:
| CiceroMark Entity | OOXML Entity |
|---|---|
| Text | <w:t> |
| Paragraph | <w:p>Content</w:p> |
| Linebreak | <w:p/> |
| Softbreak | <w:r><w:sym></w:r> |
| Emph | <w:i> |
| Variable | <w:sdt> |
| List Block/ List | <w:numPr><w:num w:val={ordered/unordered}/></w:numPr> |
| List Item(Text) | <w:t/> |
| List Item(Variable) | <w:sdt/> |
Conversions which are left:
-
[x]
Strong -
[x]
Code -
[x]
Link -
[ ]
Image -
[ ]
BlockQuote -
[x]
CodeBlock -
[x]
ThematicBreak -
[x]
Clause -
[x]
Optional -
[x]
Conditional -
[x]
Formula / Ergo expressions
In the left conversions, we need to decide which ones need major importance/priority and which can be given a lower priority. Furthermore, we also need to think about whether all these will be present in the contract (IMO, Code and CodeBlock generally won't occur in the contract)
Entities and their corresponding Ciceromark
Heading
{
"$class": "org.accordproject.commonmark.Heading",
"level": "2",
"nodes": [ ... ]
},
Paragraph
{ "$class": "org.accordproject.commonmark.Paragraph",
"nodes": [
...
} }
Text
{ "$class": "org.accordproject.commonmark.Text",
"text": "Try TemplateMark" }
Softbreak
"$class": "org.accordproject.commonmark.Softbreak"Variable
{
"$class": "org.accordproject.ciceromark.Variable",
"value": "\"Widgets\"",
"name": "deliverable",
"elementType": "String"
},
Link
"$class": "org.accordproject.commonmark.Link",
"destination": "https://github.com/accordproject/markdown-transform",
"title": "",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "@accordproject/markdown-transform"
}
]
Image
"$class": "org.accordproject.commonmark.Image",
"destination": "https://github.com/accordproject/markdown-transform",
"title": "",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "@accordproject/markdown-transform"
}
]
Thematic Break
"$class": "org.accordproject.commonmark.ThematicBreak"
Emphasis
"$class": "org.accordproject.commonmark.Emph",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "They can also, of course, contain "
}
]
Strong
"$class": "org.accordproject.commonmark.Strong",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "markdown"
}
]
Code
{
"$class": "org.accordproject.commonmark.Code",
"text": "hello"
}
CodeBlock
{
"$class": "org.accordproject.commonmark.CodeBlock",
"text": "testing purposes\n"
}
BlockQuote
{
"$class": "org.accordproject.commonmark.BlockQuote",
"nodes": [
{
"$class": "org.accordproject.commonmark.Paragraph",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "First line"
}
]
}
]
}
Ordered List
{
"$class": "org.accordproject.commonmark.List",
"type": "ordered",
"start": "1",
"tight": "true",
"delimiter": "period",
"nodes": [...]
}
Unordered List
{
"$class": "org.accordproject.commonmark.List",
"type": "bullet",
"tight": "true",
"nodes": [...]
}
ListItem
{
"$class": "org.accordproject.commonmark.Item",
"nodes": [...]
}
Conditional
{
"$class": "org.accordproject.ciceromark.Conditional",
"whenTrue": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "This is a force majeure"
}
],
"whenFalse": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "This is "
},
{
"$class": "org.accordproject.commonmark.Emph",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "not"
}
]
},
{
"$class": "org.accordproject.commonmark.Text",
"text": " a force majeure"
}
],
"name": "forceMajeure"
}
Optional
{
"$class": "org.accordproject.ciceromark.Optional",
"whenSome": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "This applies except for Force Majeure cases in a "
},
{
"$class": "org.accordproject.templatemark.VariableDefinition",
"name": "miles"
},
{
"$class": "org.accordproject.commonmark.Text",
"text": " miles radius."
}
],
"whenNone": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "This applies even in case a force majeure."
}
],
"name": "forceMajeure"
}
Clause
{
"$class": "org.accordproject.ciceromark.Clause",
"name": "clauseName",
"nodes": [
{
"$class": "org.accordproject.commonmark.Paragraph",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "...Markdown of the clause..."
}
]
}
]
}
Formula
{
"$class": "org.accordproject.templatemark.Formula",
"dependencies": [],
"code": " formulas ",
"name": "formula_8e04633f576f94d0333aa7cb5a60f69edb9828f3eab05c59db02d2baa56ab685"
}
Entities and their corresponding OOXML Tag
Heading
<w:pPr>
<w:pStyle w:val="${definedLevels[level].style}"/>
</w:pPr>
<w:r>
<w:rPr>
<w:sz w:val="${definedLevels[level].size * 2}"/>
</w:rPr>
<w:t xml:space="preserve">${sanitizeHtmlChars(value)}</w:t>
</w:r>
Emphasis
<w:r>
<w:rPr>
<w:i />
</w:rPr>
<w:t>${sanitizeHtmlChars(value)}</w:t>
</w:r>
Strong
<w:r>
<w:rPr>
<w:b />
<w:bCs /<
</w:rPr>
<w:t>${sanitizeHtmlChars(value)}</w:t>
</w:r>
Text
<w:r>
<w:t xml:space="preserve">${sanitizeHtmlChars(value)}</w:t>
</w:r>
Paragraph
<w:p>
value
</w:p>
Softbreak
<w:r>
<w:sym w:font="Calibri" w:char="2009" />
</w:r>
Variable
<w:sdt>
<w:sdtPr>
<w:rPr>
<w:sz w:val="24"/>
</w:rPr>
<w:alias w:val="${titleGenerator(title, type)}"/>
<w:tag w:val="${tag}"/>
</w:sdtPr>
<w:sdtContent>
<w:r>
<w:rPr>
<w:sz w:val="24"/>
</w:rPr>
<w:t xml:space="preserve">${sanitizeHtmlChars(value)}</w:t>
</w:r>
</w:sdtContent>
</w:sdt>
Formula / Ergo expressions Clause Conditional Optional Strong BlockQuote CodeBlock Code ThematicBreak Link Image
Create a checkbox for this. It will be easier to track.
we also need to think about whether all these will be present in the contract (IMO, Code and CodeBlock generally won't occur in the contract)
Do raise this up in the meeting tomorrow :)
Since you have created the table, don't forget to link this to the next wiki page you will write.
Entities and their corresponding Ciceromark
Will you also be maintaining the OOXML counterpart? I recommend so.
Entities and their corresponding Ciceromark
Will you also be maintaining the OOXML counterpart? I recommend so.
Yeah, I will also maintain them as we start writing the transformer for it.
@algomaster99 @dselman
A doubt regarding the variable transformer.
When we rendered in the add-in, some variables had extra quotation marks. For more reference see here. The quotation marks were removed.
The tests now require these quotation marks in the transformer, as shown in the above image(test failing if no quotation marks).
So how to proceed here? I have come on two solutions:
- Remove the stripping off quotation marks altogether (rendering the probably unnecessary quotes).
- Remove the stripping off quotation marks in the transformer to make it pass the test. The problem here can be if we convert from CiceroMark->OOXML via the transformer or markus-cli (possibly in the future) and then open the file in the add-in, we will have double quotes in variable values. In addition, if we add another template using the add-in, those values won't have double quotes.
PS: I am not a fan of either approach.
Where did you get the CiceroMark for testing? If you got it from a latest template, change the variable transformer to enclose variables in "".
@algomaster99
I got it from acceptance-of-delivery.json.
As for variable transformer do you mean CiceroMarkToOOXML or OOXMLtoCiceroMark.
We strip off quotation marks in the former one.
Also, enclosing variable values in "" in the latter won't make sense as variables which are of type DateTime, Number, etc don't have quotes around them.
First of all, that acceptance-of-delivery.json is parsed from an older version of the template. You might want to update that. Next, if the latest parsed CiceroMark still encloses its variables in "", you don't need to add extra code for enclosing the value in "" or stripping them off "" because when you will be iterating through the CiceroMark, you will get the value as "\"Party A\"" and not "Party A". But if does not enclose the value in "", again, you don't need to add extra code because you will get "Party A" as the value only and not "\"Party A\"".
Overall, I don't see why we should be concerned about explicitly stripping or adding "" around the variable value.
@algomaster99
Overall, I don't see why we should be concerned about explicitly stripping or adding "" around the variable value.
The main reason was that some variable values had ""wrapped and some not. The ""was wrapped around basically in strings. So I thought it might have been unintentional and that they could have been inserted while transforming by mistake.
@algomaster99 I will keep the values as it is in the transformers
Exactly my point :)
@K-Kumar-01 from now, don't mention issue numbers in every commit you push. As you can see, it has cluttered this PR. I think mentioning the issue number should only be necessary for the PR title as your PRs are squashed anyway.
@algomaster99 @dselman
There is no such thing as blockquote in ms-word. It basically involves styling a given paragraph. Reference videos, Styling Reference for shading.
So from these, can you decide some specifications for the blockquote which we need?
Also, for `inline code, I am thinking of this formatting:

Thanks in advance:)