Rewrite this one in C(++) or reduce functionality
Making it dirty like this http://github.com/tautologistics/node-htmlparser/blob/master/utils_example.js is so much faster! I have rewritten a piece of code from PHP to node and Apricot is really slow compared to that for big and many HTML files.
Rewrite what? Are you able to provide some bench marks? If you just want a fast parser, use htmlparser.
Yes, I am still using htmlparser for parsing and it's pretty fast. I have tried to get elements with sizzle selectors. Doing it manually with the rudimentary DOM support of htmlparser is many times faster. I will provide you some benchmarks in some days.
I'm having some performance troubles as well. Consider the following code:
https://gist.github.com/97db243b2ba3a3f9f458
time node index.js
Documented loaded
Elements found
real 0m15.752s
user 0m12.399s
sys 0m0.061s
Pretty much all of those 15 seconds are spend on executing the find('a') call on the document, so something seems wrong here.
--fg
Interesting, thanks, ill dig in.
Thanks for the quick reply. I was thinking of creating a small node app that lists all existing node.js modules by scraping various sources and lets you sorts things by github forks or google backlinks.