elasticsearch-jdbc
elasticsearch-jdbc copied to clipboard
Issue while importing data from mysql
Hello, I am using jdbc importer for importing large data which is near about 38000 records from mysql by performing some joins.The data is imported and index into elasticsearch but not all records are inserted into index.some are left like for 38000 records it is inserting only 28385 records why so? is data lost during process? please help.
Thanks
No, data is not lost.
You should provide more detail information about your problem. Set log level to debug and find out when JDBC importer submit bulk requests.
Hello, Since last weak i was stuck at same problem i am not able to find what is issue.I am giving this script and integer value as id but it is importing 27385 record only but in mysql it is 38485. secondaly when i haven't mention the id i,e when elasticsearch auto generate id it is importing extra records like 40499 instead 38485. When i have given limit to query in script like limit 500 it is showing 476 records in index and when limit 300 all 300 records shown why it is so? As i am new to all this think i am not able to find why it so. please help if anything is wrong with the script.
Hello, I also have one question if i updated row in mysql is there way to reflect that change into index in elasticsearch when next time we run the script?
Thanks
Set log level to debug and find out when JDBC importer submit bulk requests.I don't get what should be done.Can you elaborate more what should be done.How to set that level?
I can only help if there is enough information, but this is not the case. What script do you mean?
Please provide the log files if possible. In the log files you can see how many rows are submitted. If the number does not match the number of documents in ES, you have submitted some docs twice or more with same ID. I do not know your data so I can not tell what you did with it.
Script means JDBC importer file where we give query.Actually it is showing submitted records but it is not in elasticsearch.even data is not repeated.This is showing lastly
[04:26:02,756][INFO ][metrics.source.plain ][pool-5-thread-1] totalrows = 31090, 10 minutes = 600080 ms, 8153775 = 7.78 MB bytes, 262.0 bytes = 262 avg size, 51.81 dps, 0.013 MB/s [04:26:02,783][INFO ][metrics.sink.plain ][pool-5-thread-1] 10 minutes = 600207 ms, submitted = 30975, succeeded = 27667, failed = 3251, 17944214 = 17.11 MB bytes, 579.0 bytes = 579 avg size, 51.607 dps, 0.029 MB/s [04:26:32,647][INFO ][metrics.source.plain ][pool-5-thread-1] totalrows = 34734, 10 minutes 30 seconds = 630080 ms, 9096871 = 8.68 MB bytes, 261.0 bytes = 261 avg size, 55.126 dps, 0.014 MB/s [04:26:32,650][INFO ][metrics.sink.plain ][pool-5-thread-1] 10 minutes 30 seconds = 630074 ms, submitted = 34289, succeeded = 30671, failed = 3561, 19846345 = 18.93 MB bytes, 578.0 bytes = 578 avg size, 54.421 dps, 0.031 MB/s [04:27:02,696][INFO ][metrics.source.plain ][pool-5-thread-1] totalrows = 39335, 11 minutes = 660143 ms, 10328538 = 9.85 MB bytes, 262.0 bytes = 262 avg size, 59.586 dps, 0.015 MB/s [04:27:02,842][INFO ][metrics.sink.plain ][pool-5-thread-1] 11 minutes = 660266 ms, submitted = 39331, succeeded = 34981, failed = 4234, 22798346 = 21.74 MB bytes, 579.0 bytes = 579 avg size, 59.568 dps, 0.034 MB/s [04:27:32,632][INFO ][metrics.source.plain ][pool-5-thread-1] totalrows = 41402, 11 minutes 30 seconds = 690080 ms, 10859994 = 10.36 MB bytes, 262.0 bytes = 262 avg size, 59.996 dps, 0.015 MB/s [04:27:32,633][INFO ][metrics.sink.plain ][pool-5-thread-1] 11 minutes 30 seconds = 690057 ms, submitted = 41400, succeeded = 36300, failed = 4264, 23985888 = 22.87 MB bytes, 579.0 bytes = 579 avg size, 59.995 dps, 0.034 MB/s
In JDBC logs--------------------
[42]: index [newindex], type [eventdata], id [320150], message [MapperParsingException[failed to parse [campaign.launchdate]]; nested: MapperParsingException[failed to parse date field [06/18/2015 18:00], tried both date format [dateOptionalTime], and timestamp number with locale []]; nested: IllegalArgumentException[Invalid format: "06/18/2015 18:00" is malformed at "/18/2015 18:00"]; ] [04:44:17,285][INFO ][importer.jdbc ][pool-2-thread-2] index name = newindex, concrete index name = newindex [04:44:17,422][INFO ][importer.jdbc.context.standard][pool-23-thread-1] found sink class org.xbib.elasticsearch.jdbc.strategy.standard.StandardSink@6ea8e6 [04:44:17,444][INFO ][importer.jdbc.context.standard][pool-23-thread-1] found source class org.xbib.elasticsearch.jdbc.strategy.standard.StandardSource@edff04 [04:44:17,457][INFO ][BaseTransportClient ][pool-23-thread-1] creating transport client, java version 1.8.0_45, effective settings {cluster.name=elasticsearch, host.0=localhost, port=9300, sniff=false, autodiscover=false, name=importer, client.transport.ignore_cluster_name=false, client.transport.ping_timeout=5s, client.transport.nodes_sampler_interval=5s} [04:44:17,489][INFO ][org.elasticsearch.plugins][pool-23-thread-1] [importer] loaded [support-1.7.0.0-8e7ca71], sites [] [04:44:18,751][INFO ][BaseTransportClient ][pool-23-thread-1] trying to connect to [inet[localhost/127.0.0.1:9300]] [04:44:18,788][INFO ][BaseTransportClient ][pool-23-thread-1] connected to [[Karnak][XwG0JKLxTVSd7NRX9jwdjw][precise32][inet[localhost/127.0.0.1:9300]]] [04:44:32,632][INFO ][metrics.source.plain ][pool-5-thread-1] totalrows = 155159, 28 minutes 30 seconds = 1710080 ms, 40824554 = 38.93 MB bytes, 263.0 bytes = 263 avg size, 90.732 dps, 0.023 MB/s [04:44:32,736][INFO ][metrics.sink.plain ][pool-5-thread-1] 28 minutes 30 seconds = 1710160 ms, submitted = 155043, succeeded = 137973, failed = 17013, 89953864 = 85.79 MB bytes, 580.0 bytes = 580 avg size, 90.66 dps, 0.051 MB/s
Thanks
i'm experiencing the same problem. mine occurred when data in mysql table contain json value for example
[19:54:16,761][ERROR][org.xbib.elasticsearch.helper.client.BulkTransportClient][elasticsearch[importer][listener][T#1]] bulk [3] failed with 25 failed items, failure message = failure in bulk execution: [22]: index [tbl_perseroan], type [tbl_perseroan], id [AVRDL1V3wvrO2IR6-xld], message [MapperParsingException[failed to parse [pemegang_saham.data.tanggal_lahir]]; nested: IllegalArgumentException[Invalid format: "08-07-1966" is malformed at "-07-1966"];] [49]: index [tbl_perseroan], type [tbl_perseroan], id [AVRDL1V3wvrO2IR6-xl4], message [MapperParsingException[object mapping for [pemegang_saham] tried to parse field [pemegang_saham] as object, but found a concrete value]] [63]: index [tbl_perseroan], type [tbl_perseroan], id [AVRDL1V3wvrO2IR6-xmG], message [MapperParsingException[object mapping for [kegiatan] tried to parse field [kegiatan] as object, but found a concrete value]]
probably i have to set it to string in the mapping (havent tried it yet) | tried it, i use detect_json and set it to false.
with detect_json set to false reduce the insert-fail, now the problem is the data to large for utf-8
[118]: index [tbl_perseroan], type [tbl_perseroan], id [AVRD5wj6QWkIBhxC0WKc], message [java.lang.IllegalArgumentException: Document contains at least one immense term in field="pemegang_saham" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[91, 123, 34, 105, 100, 34, 58, 49, 44, 34, 100, 97, 116, 97, 34, 58, 91, 123, 34, 109, 120, 121, 112, 108, 121, 122, 121, 107, 34, 58]...', original message: bytes can be at most 32766 in length; got 94563]
if there's any way to handle this?
Hi I have the same problem, my index is made succesfully, And no record failed , I limited in SQL 10000 items, but I just have submitted = 9462, succeeded = 6871, failed = 0; My SQL is limited with 10000, so why it just submited 9462? Although no records is failed but the succeeded records just 6871 And when I execute the scripts 2,3 times, it output different result. Ex: 1th: LIMIT 10000, submitted = 9462, succeeded = 6871, failed = 0, 2th: LIMIT 10000, submitted = 10000, succeeded = 7583, failed = 0, 3th: LIMIT 10000, submitted = 7980, succeeded = 6485, failed = 0, .....
Please help me to explain this, thank you so much.
[14:42:21,897][INFO ][metrics.sink.plain ][pool-4-thread-1] 4 minutes 57 seconds = 297796 ms, submitted = 9462, succeeded = 6871, failed = 0, 167357004 = 159.60 MB bytes, 17.27 KB = 17,685 avg size, 31.773 dps, 0.549 MB/s creating updateno2 index is completed