Andy Lester
Andy Lester
http://stackoverflow.com/questions/28221901/exclude-the-beginning-from-the-regex
https://github.com/ericchiang/pup
The site uses the old Google Analytics format. Upgrade it.
[This thread on StackOverflow](http://stackoverflow.com/questions/3577641/how-to-parse-and-process-html-xml-with-php)
Nokogiri at the very least.
http://stackoverflow.com/questions/18274253/how-to-search-for-class-names-in-a-html-file-using-regex has a .NET one that looks cool.
http://stackoverflow.com/questions/15840996/remove-html-tags-in-c/15841056
http://stackoverflow.com/questions/13915614/extracting-html-fields-through-xpath
Please add comments here if you find a page that has good explanations that we can cut'n'paste into htmlparsing.com http://stackoverflow.com/questions/3577641/how-to-parse-and-process-html-with-php
Right now the header says "How to parse HTML", but it should say "How to parse HTML with Perl" or whatever the language is.