hyphe icon indicating copy to clipboard operation
hyphe copied to clipboard

Websites crawler with built-in exploration and control web interface

Results 59 hyphe issues
Sort by recently updated
recently updated
newest added

When dealing with a high number of pages, the web entity folder view can take a very large amount of time to display a folder view. In my case I...

fine tuning
web interface

![image](https://user-images.githubusercontent.com/193478/55246038-ab7f2100-5244-11e9-874f-b67f08a87452.png) the last " is kept from a href="url" parsing

web interface
bug

TODO: - [x] download chrome headless + driver - [x] remove phantom binary - [x] plug chrome within selenium in scrapyd spider - [x] include install in build docker -...

The recrawl process first asks the crawl limits (see #158 ) There is also an option to avoid downloading already downloaded page (set to true by default). This process will...

It would be useful to extract a clean textual content from each web page. We could use Boilerpipe for instance https://github.com/kohlschutter/boilerpipe

discussion
feature
core

such as tooltips, or text for each box and entry

fine tuning
web interface

We need three settings for web entity crawls: - *depth* : integer or infinity (only if *number of pages* is not infinite) - *number of pages* : integer or infinity...

discussion
crawler

Which could be justified (as in "0 elements in the elements selected were tagged with that particular tag before you clicked"), but it gets awkward as you can tag the...

web interface
bug