Docling document - math equations modelling
Equations is a coming soon feature, so it might be a bit too early to discuss, but I would like to understand how equation can be represented in Docling Document Model?
Basically there are two cases:
- Display Equations
- Inline Equations
Display Equations seem to be just TextItems with either the sanitized representation containing the TeX formula or using some extension adding the “tex” attribute, like this:
{
"self_ref": "#/texts/47",
"parent": {
"cref": "#/body"
},
"children": [],
"label": "formula",
"prov": [
…
],
"orig": "Attention( Q,K,V ) = softmax( QK T \u221a d k ) V (1)",
"text": "Attention( Q,K,V ) = softmax( QK T \u221a d k ) V (1)”,
“tex”: ”$[\mathrm{Attention}(Q, K, V) = \mathrm{softmax}(\frac{QK^T}{\sqrt{d_k}})V]”
},
Inline Equations modeling might require the hierarchy supported by Docling Document - a parent paragraph TextItem containing the list of children text items which are a mixture of the texts and inline equations? E.g., the TeX paragraph
Where the projections are parameter matrices $W^Q_i \in \mathbb{R}^{\dmodel \times d_k}$, $W^K_i \in \mathbb{R}^{\dmodel \times d_k}$, $W^V_i \in \mathbb{R}^{\dmodel \times d_v}$ and $W^O \in \mathbb{R}^{hd_v \times \dmodel}$.
after can be represented with a tree like this:
- parent: paragraph
- child text: “Where the projections are parameter matrices”
- child inline equation: “$W^Q_i \in \mathbb{R}^{\dmodel \times d_k}$, $W^K_i \in \mathbb{R}^{\dmodel \times d_k}$”
- child text: “,”
...
- child text: “.”
Questions:
- How inline equations to be implemented in docling (and will it be implemented at all)? For example, a tree relationships within a paragraph make more sense if there is provenance information for inline equations and other child text nodes, otherwise markdown like the Nougat output seems to be more convenient in the paragraph.text field.
- Is using tree structures for modeling inline equations within a paragraph consistent with the original design?
- And, if yes, how to extend the model if needed, for instance, to define which text element requires a line break (regular paragraph, display equation) and which does not (inline equation, text between inline equations)
Thank you!
@vitaly-d Since docling 2.17.0 we have code and equation transcription, which is limited to "display equations". We can not yet detect inline equations as parts of paragraphs.
Could you please let us know if "inline equation detection" is going to be supported soon? It'd be super useful if you could support it.
@ShayanTalaei Yes, it kind of is already. We have it supported using the INLINE group functionality in the DoclingDocument and with the new VLM, we should be able to tackle it.