elasticsearch-river-web
elasticsearch-river-web copied to clipboard
EAS returns jiberish
Hello,
I am using EAS v1.3.2 and river-web v1.3.0. I get jiberish for some of the fields I crawl. Note: This was working fine for EAS 1.1.1 and river-web 1.1.2. I think it has to do with the versions of elasticsearch and river. Thank you in advance.
Could you provide more info to reproduce the problem?
My config file:
{
"type": "web",
"crawl": {
"index": "xx",
"type": "xx",
"url": ["xx"],
"includeFilter": ["xx/.*"],
"maxDepth": 1,
"maxAccessCount": 5000,
"numOfThread": 5,
"interval": 1000,
"userAgent": "Elasticsearch crawler",
"overwrite": true,
"target": [{
"pattern": {
"url": "xx/detail.*",
"mimeType": "text/html"
},
"properties": {
"id": {
"text": "div.page ul li.id"
},
"title": {
"text": "title"
},
"path": {
"text": "div.page ul li.path"
},
"lang_id": {
"text": "div.page ul li.lang-id"
},
"lang_code": {
"text": "div.page ul li.lang-code"
},
"site_id": {
"html": "div.page ul li.site-id",
"script": "value = (int)value"
},
"content": {
"text": "div.page div.content"
},
"suggest": {
"text": "div.page div.content"
}
}
}]
}
}
id has a value of a random string, instead of a number and site_id has a totally wrong value.
I have tried all of the latest versions of EAS. This problem occurs with EAS >= 1.3.0. With EAS v1.2.4 works fine.