batch-import icon indicating copy to clipboard operation
batch-import copied to clipboard

long string problem of node index

Open bogus71 opened this issue 12 years ago • 1 comments

I got the following error in running batch-import with long strings(ex. 100000 characters) of a node index file. But I got no error with the old batch-import-jar-with-dependencies.jar file.

E:\work\batch-import>mvn clean compile exec:java -Dexec.mainClass="org.neo4j.batchimport.Importer" -Dexec.args="neo4j/data/graph.db nodes.csv rels.csv node_index users fulltext nodes_index.csv" [INFO] Scanning for projects... [INFO] [INFO] ------------------------------------------------------------------------ [INFO] Building Neo4j Batch Importer 1.9-SNAPSHOT [INFO] ------------------------------------------------------------------------ [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ batch-import --- [INFO] Deleting E:\work\batch-import\target [INFO] [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ batch-import --- [WARNING] Using platform encoding (MS949 actually) to copy filtered resources, i.e. build is platform dependent! [INFO] Copying 1 resource [INFO] [INFO] --- maven-compiler-plugin:2.1:compile (default-compile) @ batch-import --- [WARNING] File encoding has not been set, using platform encoding MS949, i.e. build is platform dependent! [INFO] Compiling 51 source files to E:\work\batch-import\target\classes [INFO] [INFO] >>> exec-maven-plugin:1.2.1:java (default-cli) @ batch-import >>> [INFO] [INFO] <<< exec-maven-plugin:1.2.1:java (default-cli) @ batch-import <<< [INFO] [INFO] --- exec-maven-plugin:1.2.1:java (default-cli) @ batch-import --- Usage java -jar batchimport.jar data/dir nodes.csv relationships.csv [node_index node-index-name fulltext|exact nodes_index.csv rel_index rel-index-name fulltext|exact rels_index.csv ....] Using Existing Configuration File

Importing 97044 Nodes took 0 seconds ................ Importing 1628244 Relationships took 5 seconds

Total import time: 10 seconds [WARNING] java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:297) at java.lang.Thread.run(Thread.java:722) Caused by: java.lang.ArrayIndexOutOfBoundsException: 32768 at org.neo4j.batchimport.utils.Chunker.nextWord(Chunker.java:53) at org.neo4j.batchimport.importer.ChunkerLineData.nextWord(ChunkerLineData.java:37) at org.neo4j.batchimport.importer.ChunkerLineData.readLine(ChunkerLineData.java:47) at org.neo4j.batchimport.importer.AbstractLineData.parse(AbstractLineData.java:118) at org.neo4j.batchimport.importer.AbstractLineData.processLine(AbstractLineData.java:67) at org.neo4j.batchimport.Importer.importIndex(Importer.java:145) at org.neo4j.batchimport.Importer.importIndex(Importer.java:178) at org.neo4j.batchimport.Importer.doImport(Importer.java:192) at org.neo4j.batchimport.Importer.main(Importer.java:74) ... 6 more [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 17.570s [INFO] Finished at: Fri Jun 28 10:41:53 KST 2013 [INFO] Final Memory: 19M/361M [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2.1:java (default-cli) on project batch-import: An exception occured while executing the Java class. null: InvocationTargetException: 32768 -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException

bogus71 avatar Jun 28 '13 01:06 bogus71

The CSV parser changed. It uses a 10k buffer by default.

May I ask what kind of data you're storing in the nodes?

You can configure the CSV parser to use a slower version that is CSV conform.

batch_import.csv.quotes=true

see https://github.com/jexp/batch-import#csv-experimental

jexp avatar Jun 28 '13 06:06 jexp