Aécio Santos
Aécio Santos
**Describe the solution you'd like** `Phi_k` is a new correlation coefficient between categorical, ordinal, and interval variables with Pearson characteristics. Paper: https://arxiv.org/pdf/1811.11440.pdf `Phi_k` seems to be based on the Chi2...
Integration test RobotsAndSitemapTest is non-deternministic and fails at times in the continuous integration server. Example stacktraces: ``` focusedCrawler.integration.RobotsAndSitemapTest > test1ToNotToDownloadSitesDisallowedOnRobots FAILED java.lang.AssertionError: URL=http://127.0.0.1:1234/disallowed-link-1.html Expected: is not a value less than...
Related to issue #147.
With these changes, all open database iterators are closed before closing the main database to avoid an inconsistent state with invalid database iterators. Related to issue #113.
- Modified PolitenessScheduler to compute the delay between same-domain requests based on the time when the download finished - Refactoring of FethcedResultHandler to simply notify the LinkStorage that the download...
Current implementation is messy, very hard to maintain, and make changes. New implementation should be compatible with current one and add new features: - [x] Should normalize relative links -...
Currently, the delay between requests to the same domain (host) considers only the time when the URLs were scheduled to be downloaded, not the time when the download was finished....