hacker-news-digest icon indicating copy to clipboard operation
hacker-news-digest copied to clipboard

question on crawler

Open thiswillbeyourgithub opened this issue 2 years ago • 2 comments

Hi,

I read this page from your doc the other day and was wondering.

Why not just article extractors made in the passed? There is even a github tag for some of them there

Just wondering, hope you don't mind

thiswillbeyourgithub avatar Jun 14 '23 15:06 thiswillbeyourgithub

Actually, I started this project almost 9 years ago - late 2014 (see my first commit), when there are only few open-sourced extractors, and they didn't perform well at that time.

One reason to write it from scratch is flexibility and customizability - I can tune the parameters so that it suits better for HN posts. One case is the HN comments page, it appears frequently on front-page but most extractors do not get the right content.

I'll try some of the modern ones later, thanks.

polyrabbit avatar Jun 15 '23 05:06 polyrabbit

Interesting thanks.

thiswillbeyourgithub avatar Jun 15 '23 13:06 thiswillbeyourgithub