markdown-transform icon indicating copy to clipboard operation
markdown-transform copied to clipboard

CiceroMark<->OOXML transformers

Open K-Kumar-01 opened this issue 4 years ago • 12 comments

Feature Request 🛍️

Incorporating the CiceroMark->OOXML transformer and improving the currently implemented OOXML->CiceroMark transformer.

Use Case

It will allow the conversion of Docx files into CiceroMark JSON and vice versa. In addition, using this Docx files can be converted into different other formats like PDF or HTML as well.

Possible Solution

The transformer for OOXML->CiceroMark is already implemented. It needs to be updated with some entities to allow full transformation.

The transformer for CiceroMark->OOXML is created in cicero-word-add-in branch. It needs to be transferred/transported here with some changes to allow the transformation.

Detailed Description

Currently, the CiceroMark->OOXML transformer supports the following conversions:

CiceroMark Entity OOXML Entity
Text <w:t>
Paragraph <w:p>Content</w:p>
Linebreak <w:p/>
Softbreak <w:r><w:sym></w:r>
Emph <w:i>
Variable <w:sdt>
List Block/ List <w:numPr><w:num w:val={ordered/unordered}/></w:numPr>
List Item(Text) <w:t/>
List Item(Variable) <w:sdt/>

Conversions which are left:

  • [x] Strong

  • [x] Code

  • [x] Link

  • [ ] Image

  • [ ] BlockQuote

  • [x] CodeBlock

  • [x] ThematicBreak

  • [x] Clause

  • [x] Optional

  • [x] Conditional

  • [x] Formula / Ergo expressions

In the left conversions, we need to decide which ones need major importance/priority and which can be given a lower priority. Furthermore, we also need to think about whether all these will be present in the contract (IMO, Code and CodeBlock generally won't occur in the contract)

Entities and their corresponding Ciceromark

Heading

        {
          "$class": "org.accordproject.commonmark.Heading",
          "level": "2",
          "nodes": [ ... ]
        },
Paragraph

        {  "$class": "org.accordproject.commonmark.Paragraph",
          "nodes": [
            ...
            } }
Text

             { "$class": "org.accordproject.commonmark.Text",
              "text": "Try TemplateMark" }
Softbreak "$class": "org.accordproject.commonmark.Softbreak"
Variable

{
          "$class": "org.accordproject.ciceromark.Variable",
          "value": "\"Widgets\"",
          "name": "deliverable",
          "elementType": "String"
        },
Link

              "$class": "org.accordproject.commonmark.Link",
              "destination": "https://github.com/accordproject/markdown-transform",
              "title": "",
              "nodes": [
                {
                  "$class": "org.accordproject.commonmark.Text",
                  "text": "@accordproject/markdown-transform"
                }
              ]
Image
"$class": "org.accordproject.commonmark.Image",
"destination": "https://github.com/accordproject/markdown-transform",
"title": "",
"nodes": [
    {
      "$class": "org.accordproject.commonmark.Text",
      "text": "@accordproject/markdown-transform"
    }
]
Thematic Break
"$class": "org.accordproject.commonmark.ThematicBreak"
Emphasis
 "$class": "org.accordproject.commonmark.Emph",
"nodes": [
  {
    "$class": "org.accordproject.commonmark.Text",
    "text": "They can also, of course, contain "
  }
]
Strong
 "$class": "org.accordproject.commonmark.Strong",
"nodes": [
  {
    "$class": "org.accordproject.commonmark.Text",
    "text": "markdown"
  }
]
Code
{
  "$class": "org.accordproject.commonmark.Code",
  "text": "hello"
}
CodeBlock
{
  "$class": "org.accordproject.commonmark.CodeBlock",
  "text": "testing purposes\n"
}
BlockQuote
{
    "$class": "org.accordproject.commonmark.BlockQuote",
    "nodes": [
      {
        "$class": "org.accordproject.commonmark.Paragraph",
        "nodes": [
          {
            "$class": "org.accordproject.commonmark.Text",
            "text": "First line"
          }
        ]
      }
    ]
  }
Ordered List
{
"$class": "org.accordproject.commonmark.List",
"type": "ordered",
"start": "1",
"tight": "true",
"delimiter": "period",
"nodes": [...]
}
Unordered List
{
"$class": "org.accordproject.commonmark.List",
"type": "bullet",
"tight": "true",
"nodes": [...]
}
ListItem
{
"$class": "org.accordproject.commonmark.Item",
"nodes": [...]
}
Conditional
{
"$class": "org.accordproject.ciceromark.Conditional",
"whenTrue": [
  {
    "$class": "org.accordproject.commonmark.Text",
    "text": "This is a force majeure"
  }
],
"whenFalse": [
  {
    "$class": "org.accordproject.commonmark.Text",
    "text": "This is "
  },
  {
    "$class": "org.accordproject.commonmark.Emph",
    "nodes": [
      {
        "$class": "org.accordproject.commonmark.Text",
        "text": "not"
      }
    ]
  },
  {
    "$class": "org.accordproject.commonmark.Text",
    "text": " a force majeure"
  }
],
"name": "forceMajeure"
}
Optional
{
"$class": "org.accordproject.ciceromark.Optional",
"whenSome": [
  {
    "$class": "org.accordproject.commonmark.Text",
    "text": "This applies except for Force Majeure cases in a "
  },
  {
    "$class": "org.accordproject.templatemark.VariableDefinition",
    "name": "miles"
  },
  {
    "$class": "org.accordproject.commonmark.Text",
    "text": " miles radius."
  }
],
"whenNone": [
  {
    "$class": "org.accordproject.commonmark.Text",
    "text": "This applies even in case a force majeure."
  }
],
"name": "forceMajeure"
}
Clause
{
"$class": "org.accordproject.ciceromark.Clause",
"name": "clauseName",
"nodes": [
  {
    "$class": "org.accordproject.commonmark.Paragraph",
    "nodes": [
      {
        "$class": "org.accordproject.commonmark.Text",
        "text": "...Markdown of the clause..."
      }
    ]
  }
]
}
Formula
{
  "$class": "org.accordproject.templatemark.Formula",
  "dependencies": [],
  "code": " formulas ",
  "name": "formula_8e04633f576f94d0333aa7cb5a60f69edb9828f3eab05c59db02d2baa56ab685"
}

Entities and their corresponding OOXML Tag

Heading

<w:pPr>
  <w:pStyle w:val="${definedLevels[level].style}"/>
</w:pPr>
<w:r>
  <w:rPr>
    <w:sz w:val="${definedLevels[level].size * 2}"/>
  </w:rPr>
  <w:t xml:space="preserve">${sanitizeHtmlChars(value)}</w:t>
</w:r>

Emphasis

<w:r>
    <w:rPr>
        <w:i />
    </w:rPr>
    <w:t>${sanitizeHtmlChars(value)}</w:t>
</w:r>

Strong

<w:r>
    <w:rPr>
        <w:b />
        <w:bCs /<
    </w:rPr>
    <w:t>${sanitizeHtmlChars(value)}</w:t>
</w:r>

Text

<w:r>
    <w:t xml:space="preserve">${sanitizeHtmlChars(value)}</w:t>
</w:r>

Paragraph

<w:p>
    value
</w:p>

Softbreak

<w:r>
  <w:sym w:font="Calibri" w:char="2009" />
</w:r>

Variable

<w:sdt>
  <w:sdtPr>
    <w:rPr>
      <w:sz w:val="24"/>
    </w:rPr>
    <w:alias w:val="${titleGenerator(title, type)}"/>
    <w:tag w:val="${tag}"/>
  </w:sdtPr>
  <w:sdtContent>
    <w:r>
      <w:rPr>
        <w:sz w:val="24"/>
      </w:rPr>
      <w:t xml:space="preserve">${sanitizeHtmlChars(value)}</w:t>
    </w:r>
  </w:sdtContent>
</w:sdt>

K-Kumar-01 avatar May 27 '21 10:05 K-Kumar-01

Formula / Ergo expressions Clause Conditional Optional Strong BlockQuote CodeBlock Code ThematicBreak Link Image

Create a checkbox for this. It will be easier to track.

we also need to think about whether all these will be present in the contract (IMO, Code and CodeBlock generally won't occur in the contract)

Do raise this up in the meeting tomorrow :)

Since you have created the table, don't forget to link this to the next wiki page you will write.

algomaster99 avatar May 27 '21 13:05 algomaster99

Entities and their corresponding Ciceromark

Will you also be maintaining the OOXML counterpart? I recommend so.

algomaster99 avatar May 30 '21 09:05 algomaster99

Entities and their corresponding Ciceromark

Will you also be maintaining the OOXML counterpart? I recommend so.

Yeah, I will also maintain them as we start writing the transformer for it.

K-Kumar-01 avatar May 30 '21 09:05 K-Kumar-01

@algomaster99 @dselman A doubt regarding the variable transformer. Screenshot from 2021-06-16 17-12-25 When we rendered in the add-in, some variables had extra quotation marks. For more reference see here. The quotation marks were removed.

The tests now require these quotation marks in the transformer, as shown in the above image(test failing if no quotation marks).

So how to proceed here? I have come on two solutions:

  1. Remove the stripping off quotation marks altogether (rendering the probably unnecessary quotes).
  2. Remove the stripping off quotation marks in the transformer to make it pass the test. The problem here can be if we convert from CiceroMark->OOXML via the transformer or markus-cli (possibly in the future) and then open the file in the add-in, we will have double quotes in variable values. In addition, if we add another template using the add-in, those values won't have double quotes.

PS: I am not a fan of either approach.

K-Kumar-01 avatar Jun 16 '21 11:06 K-Kumar-01

Where did you get the CiceroMark for testing? If you got it from a latest template, change the variable transformer to enclose variables in "".

algomaster99 avatar Jun 16 '21 12:06 algomaster99

@algomaster99 I got it from acceptance-of-delivery.json. As for variable transformer do you mean CiceroMarkToOOXML or OOXMLtoCiceroMark. We strip off quotation marks in the former one. Also, enclosing variable values in "" in the latter won't make sense as variables which are of type DateTime, Number, etc don't have quotes around them.

K-Kumar-01 avatar Jun 16 '21 13:06 K-Kumar-01

First of all, that acceptance-of-delivery.json is parsed from an older version of the template. You might want to update that. Next, if the latest parsed CiceroMark still encloses its variables in "", you don't need to add extra code for enclosing the value in "" or stripping them off "" because when you will be iterating through the CiceroMark, you will get the value as "\"Party A\"" and not "Party A". But if does not enclose the value in "", again, you don't need to add extra code because you will get "Party A" as the value only and not "\"Party A\"".

Overall, I don't see why we should be concerned about explicitly stripping or adding "" around the variable value.

algomaster99 avatar Jun 16 '21 13:06 algomaster99

@algomaster99

Overall, I don't see why we should be concerned about explicitly stripping or adding "" around the variable value.

The main reason was that some variable values had ""wrapped and some not. The ""was wrapped around basically in strings. So I thought it might have been unintentional and that they could have been inserted while transforming by mistake.

K-Kumar-01 avatar Jun 16 '21 13:06 K-Kumar-01

@algomaster99 I will keep the values as it is in the transformers

K-Kumar-01 avatar Jun 16 '21 13:06 K-Kumar-01

Exactly my point :)

algomaster99 avatar Jun 16 '21 13:06 algomaster99

@K-Kumar-01 from now, don't mention issue numbers in every commit you push. As you can see, it has cluttered this PR. I think mentioning the issue number should only be necessary for the PR title as your PRs are squashed anyway.

algomaster99 avatar Jun 22 '21 08:06 algomaster99

@algomaster99 @dselman There is no such thing as blockquote in ms-word. It basically involves styling a given paragraph. Reference videos, Styling Reference for shading.

So from these, can you decide some specifications for the blockquote which we need?

Also, for `inline code, I am thinking of this formatting: Screenshot (70)

Thanks in advance:)

K-Kumar-01 avatar Jun 23 '21 16:06 K-Kumar-01