arcadedb icon indicating copy to clipboard operation
arcadedb copied to clipboard

Importer: url and id type problems

Open gramian opened this issue 1 year ago • 2 comments

ArcadeDB Version:

ArcadeDB Server v24.2.1 (build 5c4448730af1f607dec0b7dfb2e8dffc6b33b3cb/1713169925164/main)

OS and JDK Version:

Running on Mac OS X 12.7.4 - OpenJDK 64-Bit Server VM 17.0.10 (Homebrew)

I found two problems with the importer particularly via SQL, ie IMPORT DATABASE:

  1. [ ] If importing a graph the url field seems to be required even though a vertices and/or a edges file is supplied as setting, which means whatever is placed in the url field is parsed and inserted as Documents even though unnecessary.
  2. [ ] Edges are only linked if the id type of the vertices file matches the type of the from and to fields of the edge file. Now by default the vertex typeIdType is String while it seems ids in edge files are Long integers by default but are unconfigurable.

Expected behavior

This:

IMPORT DATABASE WITH vertices="file://vertices.csv", verticesFileType=csv, typeIdProperty=Id, edges="file://edges.csv", edgesFileType=csv, edgeFromField="From", edgeToField="To"

should work.

Actual behavior

Internal error Cannot invoke "com.arcadedb.query.sql.parser.Url.toString(java.util.Map, StringBuilder)" because "this.url" is null
Error on command execution (PostCommandHandler)
java.lang.NullPointerException: Cannot invoke "com.arcadedb.query.sql.parser.Url.toString(java.util.Map, StringBuilder)" because "this.url" is null
	at com.arcadedb.query.sql.parser.ImportDatabaseStatement.toString(ImportDatabaseStatement.java:86)
	at com.arcadedb.query.sql.parser.SimpleNode.toString(SimpleNode.java:105)
	at com.arcadedb.query.sql.executor.SingleOpExecutionPlan.prettyPrint(SingleOpExecutionPlan.java:106)
	at com.arcadedb.server.http.handler.PostCommandHandler.lambda$execute$0(PostCommandHandler.java:117)
	at java.base/java.util.Optional.ifPresent(Optional.java:178)
	at com.arcadedb.server.http.handler.PostCommandHandler.execute(PostCommandHandler.java:117)
	at com.arcadedb.server.http.handler.DatabaseAbstractHandler.execute(DatabaseAbstractHandler.java:100)
	at com.arcadedb.server.http.handler.AbstractServerHttpHandler.handleRequest(AbstractServerHttpHandler.java:127)
	at io.undertow.server.Connectors.executeRootHandler(Connectors.java:393)
	at io.undertow.server.HttpServerExchange$1.run(HttpServerExchange.java:859)
	at org.jboss.threads.ContextHandler$1.runWith(ContextHandler.java:18)
	at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513)
	at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1538)
	at org.xnio.XnioWorker$WorkerThreadFactory$1$1.run(XnioWorker.java:1282)
	at java.base/java.lang.Thread.run(Thread.java:840)

Steps to reproduce

This works:

IMPORT DATABASE file://empty.csv WITH vertices="file://vertices.csv", verticesFileType=csv, typeIdProperty=Id, typeIdType=Long, edges="file://edges.csv", edgesFileType=csv, edgeFromField="From", edgeToField="To"

but it needs a dummy empty.csv (an empty file) to avoid useless Document insertions and typeIdType=Long to ensure vertex id types match edge "from" and "to" types.

PS: I am using vertices.csv and edges.csv.

gramian avatar Apr 15 '24 12:04 gramian

Did you find a workaround for this? I have a large dataset that would take some compute to map String IDs to Long IDs. I got " found schema property Node.Id of type STRING, while analyzing the source type LONG was found".

All the nodes were created but it failed to add any edges.

@lvca Do you have any planned development for configuring link types?

Looks to be related to this hardcoded line: https://github.com/ArcadeData/arcadedb/blob/df72f59e9b9ebb07b0bf0c7adbbd6eedc4bbb44d/integration/src/test/java/com/arcadedb/integration/importer/CSVImporterIT.java#L64

TheBeastCoding avatar Dec 26 '24 15:12 TheBeastCoding

No, unfortunately not yet.

gramian avatar Dec 26 '24 18:12 gramian