crawler
crawler copied to clipboard
Simple java web crawler
Dear Github OSS Developer, My name is Raula Kula, a research assistant prof at Osaka University, Japan. Currently I am studying the maintenance of Maven Libraries. I am currently focused...
Today the body and headers are stored for every page, independent on if it will be used or not. Look fo a cleaner solution where objects are only in memory...
Add the possibility to feed multiple domains to follow.
Like say you want to fetch max 10 links from a page, get the 10 first ones and return them.
Check if it should be done in a better way to pass cookies etc.