ONE SHOT
ONE SHOT
Hi marevol I have same problem when maxAccessCount is reached. Each crawling duplicate url. ``` json { "_index": "webcrawler_index", "_type": "website", "_id": "AUpGekQeLK-5UwiTrXu6", "_score": 1, "_source": { "method": "GET", "contentLength":...
The mapping : ``` json curl -XPUT "localhost:9200/webcrawler_index/website/_mapping" -d ' { "website" : { "dynamic_templates" : [ { "url" : { "match" : "url", "mapping" : { "type" : "string",...
``` json "crawl" : { "index" : "webcrawler_index", "url" : ["https://www.[...].nc/"], "includeFilter" : ["https://www.[...].nc/*"], "maxDepth" : 3, "maxAccessCount" : 50, "numOfThread" : 10, "overwrite" : true, "userAgent" : "Mozilla/5.0 (Windows...
Hi Shinsuke I installed your new release and it seems to be ok. Question : why "overwrite" deletes and inserts instead of replacing? I use elasticsearch as a nosql database...
With the lastModified field : ``` json { "type" : "web", "crawl" : { "index" : "webcrawler_index", "url" : [ "http://www.[...].nc/discover/" ], "includeFilter" : [ "http://www.[...].nc/discover/[0-9]+/", "http://www.[...].nc/[^/\\?]+/" ] "maxDepth" :...
I have a dynamic template : ``` json { "website" : { "dynamic_templates" : [ { "url" : { "match" : "url", "mapping" : { "type" : "string", "store" :...