wikitable2csv icon indicating copy to clipboard operation
wikitable2csv copied to clipboard

CSV contains citation link text

Open ptrstn opened this issue 5 years ago • 0 comments

Hello,

when the wiki table contains a citation (e.g. [2] ), the generated csv will interpret it as pure text. This is probably not desired.

Example: https://de.wikipedia.org/wiki/Liste_traditioneller_Radikale#Tabelle_der_Radikale

citation

Output:

Nr.,Zeichen (Varianten),Pīnyīn,Bedeutung und Anmerkungen,Häufig-keit,Kurz-zeichen,Beispiele
147,.mw-parser-output .Hant{font-size:110%}見,jiàn,sehen,161,见[2],規親覺觀
148,角,jiǎo,"Horn, Ecke",158,,觚解觕觥觸
149,言 (訁 links),yán,"sprechen, Wort",861,讠[2]links,誁詋詔評詗詥試詧

(The [2] is the undesired text, because it is useless by itself)

The HTML responsible for this is:

<td>
   <link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r184932629">
   <span lang="zh-Hans" class="Hans">见</span>
   <sup id="cite_ref-s_2-1" class="reference">
      <a href="#cite_note-s-2">[2]</a>
   </sup>
</td>

Can the citation links (hyperlinks with square brackets) be removed when generating the csv? So basically all the <a> tags that are surrounded by a <sup> tag with class="reference".

ptrstn avatar Aug 13 '20 21:08 ptrstn