DataX
DataX copied to clipboard
ElasticsearchWriter 字段type为date,format为“yyyy-MM-dd HH:mm:ss”,转换后格式不正确
由MySQL中集成数据到Elasticsearch中,Elasticsearch对应字段格式为:yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis, DataxElasticsearchWriter 字段type为date,format为“yyyy-MM-dd HH:mm:ss”,执行后报错
[0-0-0-writer] ERROR StdoutPluginCollector - 脏数据: {"message":"status:[400], error: {"type":"mapper_parsing_exception","reason":"failed to parse field [update_time] of type [date] in document with id '5'. Preview of field's value: '2021-04-19T10:43:36.000+08:00'","caused_by":{"type":"illegal_argument_exception","reason":"failed to parse date field [2021-04-19T10:43:36.000+08:00] with format [yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis]","caused_by":{"type":"date_time_parse_exception","reason":"Failed to parse with all enclosed parsers"}}}","record":[{"byteSize":1,"index":0,"rawData":1,"type":"LONG"},{"byteSize":0,"index":1,"rawData":"","type":"STRING"},{"byteSize":9,"index":2,"rawData":"yicai.com","type":"STRING"},{"byteSize":8,"index":3,"rawData":1618800208000,"type":"DATE"},{"byteSize":5,"index":4,"rawData":"中国,上海","type":"STRING"},{"byteSize":1,"index":5,"rawData":5,"type":"LONG"},{"byteSize":5,"index":6,"rawData":"第一财经网","type":"STRING"},{"byteSize":2,"index":7,"rawData":"{}","type":"STRING"},{"byteSize":8,"index":8,"rawData":1618800216000,"type":"DATE"}],"type":"writer"}
跟踪代码后发现com.alibaba.datax.plugin.writer.elasticsearchwriter.ESWriter类: private String getDateStr(ESColumn esColumn, Column column) { DateTime date = null; DateTimeZone dtz = DateTimeZone.getDefault(); if (esColumn.getTimezone() != null) { // 所有时区参考 http://www.joda.org/joda-time/timezones.html dtz = DateTimeZone.forID(esColumn.getTimezone()); } if (column.getType() != Column.Type.DATE && esColumn.getFormat() != null) { DateTimeFormatter formatter = DateTimeFormat.forPattern(esColumn.getFormat()); date = formatter.withZone(dtz).parseDateTime(column.asString()); return date.toString(); } else if (column.getType() == Column.Type.DATE) { date = new DateTime(column.asLong(), dtz); return date.toString(); } else { return column.asString(); } } 这部分代码对应ESWriter的日期类型转换没有起到作用。 以上加粗部分column.getType() != Column.Type.DATE 是不是应该改为column.getType() == Column.Type.DATE date.toString() 是不是应该改为date.toString(formatter); date.toString() :Output the date time in ISO8601 format (yyyy-MM-ddTHH:mm:ss.SSSZZ).
由MySQL中集成数据到Elasticsearch中,Elasticsearch对应字段格式为:yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis, DataxElasticsearchWriter 字段type为date,format为“yyyy-MM-dd HH:mm:ss”,执行后报错
[0-0-0-writer] ERROR StdoutPluginCollector - 脏数据: {"message":"status:[400], error: {"type":"mapper_parsing_exception","reason":"failed to parse field [update_time] of type [date] in document with id '5'. Preview of field's value: '2021-04-19T10:43:36.000+08:00'","caused_by":{"type":"illegal_argument_exception","reason":"failed to parse date field [2021-04-19T10:43:36.000+08:00] with format [yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis]","caused_by":{"type":"date_time_parse_exception","reason":"Failed to parse with all enclosed parsers"}}}","record":[{"byteSize":1,"index":0,"rawData":1,"type":"LONG"},{"byteSize":0,"index":1,"rawData":"","type":"STRING"},{"byteSize":9,"index":2,"rawData":"yicai.com","type":"STRING"},{"byteSize":8,"index":3,"rawData":1618800208000,"type":"DATE"},{"byteSize":5,"index":4,"rawData":"中国,上海","type":"STRING"},{"byteSize":1,"index":5,"rawData":5,"type":"LONG"},{"byteSize":5,"index":6,"rawData":"第一财经网","type":"STRING"},{"byteSize":2,"index":7,"rawData":"{}","type":"STRING"},{"byteSize":8,"index":8,"rawData":1618800216000,"type":"DATE"}],"type":"writer"}
跟踪代码后发现com.alibaba.datax.plugin.writer.elasticsearchwriter.ESWriter类: private String getDateStr(ESColumn esColumn, Column column) { DateTime date = null; DateTimeZone dtz = DateTimeZone.getDefault(); if (esColumn.getTimezone() != null) { // 所有时区参考 http://www.joda.org/joda-time/timezones.html dtz = DateTimeZone.forID(esColumn.getTimezone()); } if (column.getType() != Column.Type.DATE && esColumn.getFormat() != null) { DateTimeFormatter formatter = DateTimeFormat.forPattern(esColumn.getFormat()); date = formatter.withZone(dtz).parseDateTime(column.asString()); return date.toString(); } else if (column.getType() == Column.Type.DATE) { date = new DateTime(column.asLong(), dtz); return date.toString(); } else { return column.asString(); } } 这部分代码对应ESWriter的日期类型转换没有起到作用。 以上加粗部分column.getType() != Column.Type.DATE 是不是应该改为column.getType() == Column.Type.DATE date.toString() 是不是应该改为date.toString(formatter); date.toString() :Output the date time in ISO8601 format (yyyy-MM-ddTHH:mm:ss.SSSZZ).
尝试过使用date.toString(formatter) 最后出现的情况是到了elasticsearch(我这边用的版本是elasticsearch7.x)之后,字段的类型变成了text,不符合需求;因为刚开始使用,不太会调试,我觉得首先要看一下reader的日期格式,然后写入elasticsearch是有建索引的操作,但是只建了索引并没有加入映射的方法,可能也是在写入的过程中,因为使用了toString()方法导致类型改变了,es是动态的,这部分没太细致研究,需要验证和查看文档,目前我已经成功将mysql5.7 -> elasticsearch7.x
同样的问题,解析日期错误,请问解决了吗?
同样的问题,解析日期错误,请问解决了吗?
需要的话自己造一下轮子,或者看一下 https://github.com/alibaba/DataX/issues/867 有大神上传了封装好的
reader: 把create_time改成 char。 elasticsearchwriter: 把create_time的type设置成 keyword。 不用关注创建index时的create_time的type。 我就是这样成功的。
按照阿里的标准,它写入es的时间是国际标准,如果想要以yyyy-MM-dd HH:mm:ss格式写入,那么把字段配置中 "type": "date" 去掉即可