middleman-blog icon indicating copy to clipboard operation
middleman-blog copied to clipboard

Remove Nokogiri dependency by using Oga as an alternative HTML parser

Open 5t111111 opened this issue 10 years ago • 12 comments

As mentioned in #105, sometimes installing Nokogiri could be the hardest part of setting up gems.

I've tried to remove Nokogiri dependency by using another html parser Oga which has been introduced lately. Although Oga is a bit still young, i thought it's worth to think about it.

5t111111 avatar Sep 30 '14 11:09 5t111111

Coverage Status

Coverage increased (+0.07%) when pulling 967f212ac9419de2989d2e34fcb42c6cf4c3fcf2 on 5t111111:remove-nokogiri-dependency into a5d75eeeb63157b2a2a71e56eb55392f594bcbae on middleman:master.

coveralls avatar Sep 30 '14 11:09 coveralls

I had exactly the same thought when Oga was released, this is a solid improvement! Thanks for coding it up.

In my experience Nokogiri is always a pain to install, I also thought it was overkill to use a full HTML parser in the first place but the link in that comment explains why this process is required (posting it here for future reference).

Arcovion avatar Sep 30 '14 16:09 Arcovion

I like this a lot. Pinging @bhollis in case he has any opinion.

tdreyno avatar Sep 30 '14 18:09 tdreyno

I'm not sure what we gain by moving from a mature, well-tested library to a brand new one? Oga still has C dependency.

bhollis avatar Oct 05 '14 03:10 bhollis

@bhollis Nokogiri continues to have many compatibility problems, last I checked it didn't work on x64 Windows and it's very buggy on JRuby. To install Nokogiri on this machine I even downgraded ruby to x86 - in fact I just tried gem install nokogiri again and well... this is the result. Looks like I won't be upgrading it any time soon, it's just not worth the trouble. Note that libraries with C extensions compile fine for me 90% of the time, Nogokiri is the main exception.

Oga doesn't have these problems as far as I can tell, it just works: https://github.com/YorickPeterse/oga#why-another-htmlxml-parser

Arcovion avatar Oct 05 '14 14:10 Arcovion

I second that. I even bet a humble bounty on Nokogiri's Win 64 compat issue. The issue was marked resolved and the bounty claimed, but it took so long that i had to migrate my home-work computer to Linux. I didn't bother to check the fix. @arcovion sais it's still broken and don't have a slightest doubt that it is.

lolmaus avatar Oct 05 '14 15:10 lolmaus

Fair enough. I'll have to try it out on my site, but I'll give it a shot. If it works, I'll probably make Oga an actual dependency rather than an implicit one.

bhollis avatar Oct 05 '14 19:10 bhollis

@bhollis Agreed, we can add it to core MM for other parsing if it's easy to install.

tdreyno avatar Oct 06 '14 01:10 tdreyno

Coverage Status

Coverage increased (+0.07%) when pulling 16339c83c9d22cd95fed609775fccb9600c0d5f5 on 5t111111:remove-nokogiri-dependency into 07e2f45301e8f41c18af86075d9169c2fbcc9f37 on middleman:master.

coveralls avatar Oct 31 '14 07:10 coveralls

@5t111111 Hiya, I am looking at integrating this pull request, as it seems Oga is well supported now and on version 2.x

It was also a dream to install and suggested.

However I have some recursion issue going on with the TextTruncator method. Wondered if you could take a look at this pull request with the V4 branch.

Many thanks

Ian

iwarner avatar May 22 '17 21:05 iwarner

@5t111111 I have tracked this down to an issue with the files being of a .markdown extension, seems it never hits the

break if remaining_length <= 0

and recurses around into negative hell

very strange

iwarner avatar May 22 '17 22:05 iwarner

I'm excited to see this happen!

bhollis avatar May 23 '17 04:05 bhollis