wtf_wikipedia icon indicating copy to clipboard operation
wtf_wikipedia copied to clipboard

Image with double-newline

Open ivan-kuzma-scx opened this issue 2 years ago • 7 comments

Hello @spencermountain ,

just found one thing with first sentence while parsing Jesus topic. https://en.wikipedia.org/wiki/Jesus image image

Cheers

ivan-kuzma-scx avatar Aug 05 '22 16:08 ivan-kuzma-scx

thanks, got a fix for this on dev

spencermountain avatar Aug 07 '22 14:08 spencermountain

Thank you!

ivan-kuzma-scx avatar Aug 09 '22 14:08 ivan-kuzma-scx

Hello @spencermountain , have found another one. Not sure if they are related.

https://en.wikipedia.org/wiki/Byzantine_Empire

image

ivan-kuzma-scx avatar Sep 28 '22 18:09 ivan-kuzma-scx

thanks @Patrik-scx - i've reproduced this below:

let str = `The '''Byzantine Empire''' {{IPAc-en|z|{|n}} also referred to as the Eastern Roman Empire`
let doc = wtf(str)
console.log(doc.sentences()[0].text())

looks like the IPAc template is getting caught on the { character. Will add this to the next release. cheers

spencermountain avatar Sep 30 '22 20:09 spencermountain

Cheers

ivan-kuzma-scx avatar Oct 01 '22 15:10 ivan-kuzma-scx

Hello @spencermountain ,

Just found additional meta in first sentence of "Jewish diaspora". Знімок екрана 2022-11-29 о 13 23 06

ivan-kuzma-scx avatar Nov 29 '22 12:11 ivan-kuzma-scx

here's the issue - the <br/> tag becomes a double-newline, which is considered two paragraphs, which trips the image parser:

str = `[[File:Jewish people around the world.svg|thumb|Map of the Jewish diaspora.<br/>
foobar]]`
let doc = wtf(str)
console.log(doc.images())
// []

this may be a stumper. I'm pretty wary of supporting links that span paragraphs cheers

spencermountain avatar Dec 09 '22 16:12 spencermountain