crawler
crawler copied to clipboard

→

Metadata

Simple java web crawler

Reame
Issues

Results 7 crawler issues

Sort by recently updated

Study on OSS maven library vulnerabilities: httpcomponents library

Dear Github OSS Developer, My name is Raula Kula, a research assistant prof at Osaka University, Japan. Currently I am studying the maintenance of Maven Libraries. I am currently focused...

Make HTMLPageResponse less memory intensive

2

comment

Today the body and headers are stored for every page, independent on if it will be used or not. Look fo a cleaner solution where objects are only in memory...

Make it possible to follow multiple domains (lex http://us.yahoo.com/)

Add the possibility to feed multiple domains to follow.

enhancement

Upgrade to latest HTTPClient 4.3.X

And upgrade the code to follow the new standard

enhancement

Check & make sure that the links are sorted with the first found one first

question

Add parameter to limit the number of fetched links

Like say you want to fetch max 10 links from a page, get the 10 first ones and return them.

enhancement

Cleanup the use of the request header

Check if it should be done in a better way to pass cookies etc.

enhancement

About

Simple java web crawler

67

Stars

54

Forks

Watchers

Owner

← Metadata

67

Stars

54

Forks

Watchers

Owner

Metadata

Simple java web crawler