wikitable2csv icon indicating copy to clipboard operation
wikitable2csv copied to clipboard

CSV output contains CSS code lines from style tag

Open ptrstn opened this issue 5 years ago • 0 comments

Hello,

I used your website and ran into a rather unexpected behavior. I tried to parse the table at https://de.wikipedia.org/wiki/Liste_traditioneller_Radikale, which, for the most part, resulted in a great csv table.

Only the lines with the number 64 and 147 contained a (unwanted) .mw-parser-output .Hant{font-size:110%}:

Nr.,Zeichen (Varianten),Pīnyīn,Bedeutung und Anmerkungen,Häufig-keit,Kurz-zeichen,Beispiele
1,一,yī,eins,42,,七三不世
2,丨,gǔn,Vertikalstrich,21,,中
3,丶,zhǔ,Tropfstrich,10,,丸主
[...]
64,"手 (.mw-parser-output .Hans{font-size:110%}才,扌 links)",shǒu,"Hand, in der Hand halten",1.203,,手打持掛挙
[...]
147,.mw-parser-output .Hant{font-size:110%}見,jiàn,sehen,161,见[2],規親覺觀
[...]

When I inspected the source code of the wiki page, I saw that this text is indeed embedded in the html table itself (only for these two lines though):

<td>
   <link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r184932623">
   <span lang="zh-Hani" class="Hani">手</span> (
   <style data-mw-deduplicate="TemplateStyles:r184932629">.mw-parser-output .Hans{font-size:110%}</style>
   <span lang="zh-Hans" class="Hans">才</span>,
   <link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r184932623">
   <span lang="zh-Hani" class="Hani">扌</span> <small>links</small>)
</td>
<td>
   <style data-mw-deduplicate="TemplateStyles:r184932626">.mw-parser-output .Hant{font-size:110%}</style>
   <span lang="zh-Hant" class="Hant">見</span>
</td>

Can the CSS code inside any <style></style> tag, or the style tag itself, be removed when generating the csv table?

Thanks!

ptrstn avatar Aug 12 '20 19:08 ptrstn