elasticsearch-river-web icon indicating copy to clipboard operation
elasticsearch-river-web copied to clipboard

EAS returns jiberish

Open ilazaridis opened this issue 11 years ago • 3 comments

Hello,

I am using EAS v1.3.2 and river-web v1.3.0. I get jiberish for some of the fields I crawl. Note: This was working fine for EAS 1.1.1 and river-web 1.1.2. I think it has to do with the versions of elasticsearch and river. Thank you in advance.

ilazaridis avatar Sep 01 '14 15:09 ilazaridis

Could you provide more info to reproduce the problem?

marevol avatar Sep 01 '14 20:09 marevol

My config file:

{
    "type": "web",
    "crawl": {
        "index": "xx",
        "type": "xx",
        "url": ["xx"],
        "includeFilter": ["xx/.*"],
        "maxDepth": 1,
        "maxAccessCount": 5000,
        "numOfThread": 5,
        "interval": 1000,
        "userAgent": "Elasticsearch crawler",
        "overwrite": true,
        "target": [{
            "pattern": {
                "url": "xx/detail.*",
                "mimeType": "text/html"
            },
            "properties": {
                "id": {
                    "text": "div.page ul li.id"
                },
                "title": {
                    "text": "title"
                },
                "path": {
                    "text": "div.page ul li.path"
                },
                "lang_id": {
                    "text": "div.page ul li.lang-id"
                },
                "lang_code": {
                    "text": "div.page ul li.lang-code"
                },
                "site_id": {
                    "html": "div.page ul li.site-id",
                    "script": "value = (int)value"
                },
                "content": {
                    "text": "div.page div.content"
                },
                "suggest": {
                    "text": "div.page div.content"
                }
            }
        }]
    }
}

id has a value of a random string, instead of a number and site_id has a totally wrong value.

ilazaridis avatar Sep 02 '14 07:09 ilazaridis

I have tried all of the latest versions of EAS. This problem occurs with EAS >= 1.3.0. With EAS v1.2.4 works fine.

ilazaridis avatar Sep 02 '14 10:09 ilazaridis