MioGatto
MioGatto copied to clipboard
Figure environments with more than two figures
Problem:
Currently, when there are more than two figures in a figure environment, tools/preprocess.py causes two problems.
The following example is from 2010.00710:
- add-sourcing the span "Effect of the number of neighbors" in the caption of Figure 2 results in additionally add-sourcing the span "Effect of datastore size on the" in the caption of Figure 3
- there is no img path written for both Figure 2 and Figure 3

Cause:
- the gd_words of the captions of Figure 2, 3 have the same ids because both figcaptions share the same parent (and its id) when applying
embed_word_span_tags[corresponding part ofperprocess.py] [corresponding part of the source html] - img path is added when there is only one figcaption in a figure environment [corresponding part of
preprocess.py]
Both of the causes above need re-designing the naming system:
- id of gd_words
- path of img
to be able to deal with multiple figcaptions in a ltx_figure.
In other words, unique id should also be made for figcaptions; currently only parent id (id of ltx_figure) is used.
Besides, maybe the above modifications are also necessary for ltx_table...? https://github.com/wtsnjp/MioGatto/blob/b082f2225709c1f5861c4082207415a764c80c51/tools/preprocess.py#L106-L118