PixivUtil2 Replace lxml with html.parser

I didn't notice that lxml required an external dependency. Rather than add it, I've just replaced it with the native html.parser, apparently it's slightly slower but shouldn't matter for this.

Nov 21 '21 14:11 Baa14453

ah, it is was old code, should have used "html5lib" instead like below code.

https://github.com/Nandaka/PixivUtil2/blob/master/PixivBrowserFactory.py#L264

Nov 21 '21 14:11 Nandaka

Ok switched to html5lib instead.

I've also added some extra steps to try and preserve some of the text data.

Without Stripping Tags:

[初音ミクシンフォニー2021]公式パンフレットにて鏡音リン・レンのイラストを描かせていただきました。&lt;br /&gt;改めましてKAITOさん15周年おめでとうございます！&lt;br /&gt;&lt;a href="/jump.php?https%3A%2F%2Fsp.wmg.jp%2Fmikusymphony%2F" target="_blank"&gt;https://sp.wmg.jp/mikusymphony/&lt;/a&gt;

Original:

[初音ミクシンフォニー2021]公式パンフレットにて鏡音リン・レンのイラストを描かせていただきました。改めましてKAITOさん15周年おめでとうございます！https://sp.wmg.jp/mikusymphony/

New:

[初音ミクシンフォニー2021]公式パンフレットにて鏡音リン・レンのイラストを描かせていただきました。 改めましてKAITOさん15周年おめでとうございます！https://sp.wmg.jp/mikusymphony/ (https://sp.wmg.jp/mikusymphony/)

It's not a great example because in this case the HREF tag and contents were the same, but it helps preserve data in cases where its not such as Click Here to go to my site. Original: Click Here to go to my site

New: Click Here to go to my site (https://important.site)

Nov 21 '21 19:11 Baa14453

Wait this won't work if there's more than one link... I will do more testing and update later this week...

Nov 21 '21 23:11 Baa14453

PixivUtil2 PixivUtil2 copied to clipboard

Replace lxml with html.parser

Without Stripping Tags:

Original:

New:

PixivUtil2
PixivUtil2 copied to clipboard