MioGatto icon indicating copy to clipboard operation
MioGatto copied to clipboard

Figure environments with more than two figures

Open delta-river opened this issue 4 years ago • 2 comments

Problem:

Currently, when there are more than two figures in a figure environment, tools/preprocess.py causes two problems.

The following example is from 2010.00710:

  1. add-sourcing the span "Effect of the number of neighbors" in the caption of Figure 2 results in additionally add-sourcing the span "Effect of datastore size on the" in the caption of Figure 3
  2. there is no img path written for both Figure 2 and Figure 3

image

Cause:

  1. the gd_words of the captions of Figure 2, 3 have the same ids because both figcaptions share the same parent (and its id) when applying embed_word_span_tags [corresponding part of perprocess.py] [corresponding part of the source html]
  2. img path is added when there is only one figcaption in a figure environment [corresponding part of preprocess.py]

delta-river avatar Apr 14 '22 06:04 delta-river

Both of the causes above need re-designing the naming system:

  • id of gd_words
  • path of img

to be able to deal with multiple figcaptions in a ltx_figure.

In other words, unique id should also be made for figcaptions; currently only parent id (id of ltx_figure) is used.

delta-river avatar Apr 15 '22 00:04 delta-river

Besides, maybe the above modifications are also necessary for ltx_table...? https://github.com/wtsnjp/MioGatto/blob/b082f2225709c1f5861c4082207415a764c80c51/tools/preprocess.py#L106-L118

delta-river avatar Apr 15 '22 00:04 delta-river