Aécio Santos

Results 54 comments of Aécio Santos

These are the files related the ES support: - [ElasticSearchRestTargetRepository.java](https://github.com/VIDA-NYU/ache/blob/master/ache/src/main/java/achecrawler/target/repository/ElasticSearchRestTargetRepository.java) (logic for dealing with indexes is here) - [ElasticSearchClientFactory.java](https://github.com/VIDA-NYU/ache/blob/master/ache/src/main/java/achecrawler/target/repository/elasticsearch/ElasticSearchClientFactory.java) Currently, it checks if the index exists, and creates one if...

I believe the main thing that breaks compatibility support ES 7+ is the removal of document types, but I haven't had the time to try to fix it yet: https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html...

Great, thanks. Did you have the chance to test if the web search interface works as well?

Just merged it. Thanks, @JuliusHenke and @chanwitkepha!

This is very similar to the problem reported in issue #186. Is 4g the maximum you can use or the same problem also happens when using more memory?

The number of threads is currently hard-coded to be the same number of CPU cores here: https://github.com/VIDA-NYU/ache/blob/1ad6a2ccbcadfdbc1b46e24fa4bd7d0505939d35/ache/src/main/java/achecrawler/crawler/async/HttpDownloader.java#L100

> I tried to change the # of cores to a 4, but it seems no matter how I set this value the number of dispatcher remains to be 12....

I'd be happy to accept a Pull Request. We won't be able to work on this on the next few weeks (maybe months). The relevant code to start to look...

No. Currently, only two crawl types of crawls (`DeepCrawl` or `FocusedCrawl`, which have specific hard-coded configurations can be started via the REST API. The entry point for the REST API...

I agree that being able to restart crawls is a good addition. I didn't have the time to test this PR though it seems to still be a work in...