PixivUtil2 icon indicating copy to clipboard operation
PixivUtil2 copied to clipboard

Replace lxml with html.parser

Open Baa14453 opened this issue 3 years ago • 3 comments

I didn't notice that lxml required an external dependency. Rather than add it, I've just replaced it with the native html.parser, apparently it's slightly slower but shouldn't matter for this.

Baa14453 avatar Nov 21 '21 14:11 Baa14453

ah, it is was old code, should have used "html5lib" instead like below code.

https://github.com/Nandaka/PixivUtil2/blob/master/PixivBrowserFactory.py#L264

Nandaka avatar Nov 21 '21 14:11 Nandaka

Ok switched to html5lib instead.

I've also added some extra steps to try and preserve some of the text data.

Without Stripping Tags:

[初音ミクシンフォニー2021]公式パンフレットにて鏡音リン・レンのイラストを描かせていただきました。<br />改めましてKAITOさん15周年おめでとうございます!<br /><a href="/jump.php?https%3A%2F%2Fsp.wmg.jp%2Fmikusymphony%2F" target="_blank">https://sp.wmg.jp/mikusymphony/</a>

image

Original:

[初音ミクシンフォニー2021]公式パンフレットにて鏡音リン・レンのイラストを描かせていただきました。改めましてKAITOさん15周年おめでとうございます!https://sp.wmg.jp/mikusymphony/

image

New:

[初音ミクシンフォニー2021]公式パンフレットにて鏡音リン・レンのイラストを描かせていただきました。 改めましてKAITOさん15周年おめでとうございます!https://sp.wmg.jp/mikusymphony/ (https://sp.wmg.jp/mikusymphony/)

image

It's not a great example because in this case the HREF tag and contents were the same, but it helps preserve data in cases where its not such as Click Here to go to my site. Original: Click Here to go to my site

New: Click Here to go to my site (https://important.site)

Baa14453 avatar Nov 21 '21 19:11 Baa14453

Wait this won't work if there's more than one link... I will do more testing and update later this week...

Baa14453 avatar Nov 21 '21 23:11 Baa14453