arcadedb icon indicating copy to clipboard operation
arcadedb copied to clipboard

CSV importer bug

Open ruispereira opened this issue 6 months ago • 18 comments

I tried to insert millions of records in the database, using for that csv files previously generated with a python script. The python script generated the vertex which i could insert without issues, the problem is when i try to insert the edges, i have an error that don't let me insert them: "edge type 'Belongs' not defined".

Steps to reproduce:

  • apply the schema previously (with vertex, edges, indexes, and so on)
  • create the vertex
  • create the edges (failure)

ruispereira avatar Jun 16 '25 09:06 ruispereira

can be semantic issue. can you share the scema and the commands sent?

tolgaulas avatar Jun 16 '25 09:06 tolgaulas

edges.csv empty.csv import-data.sql.txt rename to .sql and fix the folder schema.sql.txt rename to .sql supervisors.csv tenants.csv

ruispereira avatar Jun 16 '25 10:06 ruispereira

docker exec -it arcadedb bin/console.sh

load import-data.sql

ruispereira avatar Jun 16 '25 10:06 ruispereira

Image

This is what i have in the logs after trying to import the edges.

If you check the schema, i have the edge type created, check also the import-data.sql data contains the sql commands that i perform.

I have the commands separated but i also tried with vertex and edge in the same command.

ruispereira avatar Jun 16 '25 10:06 ruispereira

Additionally, current head produces a Error on transaction execution / Transaction not begun for IMPORT DATABASE depending on if file:// is used for the empty.csv.

gramian avatar Jun 16 '25 10:06 gramian

Yes, i had to create an empty.csv to run the command

ruispereira avatar Jun 16 '25 10:06 ruispereira

For importing the edges:

  1. Add a header line to the edges.csv for example Tenant, Supervisor (or use the edgesHeader setting),
  2. in the IMPORT DATABASE command use edgeFromField=Supervisor and edgeToField=Tenant
  3. and not typeIdProperty=id, typeIdType=Long

gramian avatar Jun 16 '25 10:06 gramian

Can't comment on the code side, but i noticed the vertex files has header and edge file has no header.. can it be it?

tolgaulas avatar Jun 16 '25 10:06 tolgaulas

i had a header previously, and i got an error while importing the edges. But i will put it again, and try.

ruispereira avatar Jun 16 '25 10:06 ruispereira

For importing the edges:

  1. Add a header line to the edges.csv for example Tenant, Supervisor (or use the edgesHeader setting),
  2. in the IMPORT DATABASE command use edgeFromField=Supervisor and edgeToField=Tenant
  3. and not typeIdProperty=id, typeIdType=Long

cat edges.csv Tenant,Supervisor 0,10 ...

like this, right? And for edgeFromField and EdgeToField: IMPORT DATABASE file:///home/arcadedb/bug/empty.csv WITH edges="file:///home/arcadedb/bug/edges.csv", edgesFileType=csv, edgeType="Belongs", edgeFromField=Tenant, edgeToField=Supervisor;

ruispereira avatar Jun 16 '25 10:06 ruispereira

Console:

{test123}> IMPORT DATABASE file:///home/arcadedb/bug/empty.csv WITH edges="file:///home/arcadedb/bug/edges.csv", edgesFileType=csv, edgeType="Belongs", edgeFromField=Tenant, edgeToField=Supervisor ERROR: com.arcadedb.remote.RemoteException: Error on executing remote operation IMPORT DATABASE file:///home/arcadedb/bug/empty.csv WITH edges="file:///home/arcadedb/bug/edges.csv", edgesFileType=csv, edgeType="Belongs", edgeFromField=Tenant, edgeToField=Supervisor (cause:com.arcadedb.integration.importer.ImportException detail:Error on parsing source 'file:///home/arcadedb/bug/edges.csv (compressed=false size=48)') at com.arcadedb.remote.RemoteHttpComponent.manageException(RemoteHttpComponent.java:500)

Logs:

2025-06-16 10:32:01.063 INFO [SourceDiscovery] <ArcadeDB_0> Analyzing url: file:///home/arcadedb/bug/empty.csv... 2025-06-16 10:32:01.069 INFO [SourceDiscovery] <ArcadeDB_0> Recognized format CSV (parsingLimitBytes=9.54MB parsingLimitEntries=0) 2025-06-16 10:32:01.069 INFO [Importer] <ArcadeDB_0> Checking schema... 2025-06-16 10:32:01.071 INFO [CSVImporterFormat] <ArcadeDB_0> Started importing documents from CSV source 2025-06-16 10:32:01.074 INFO [CSVImporterFormat] <ArcadeDB_0> Importing the following document properties: [] 2025-06-16 10:32:01.077 INFO [CSVImporterFormat] <ArcadeDB_0> Importing of documents from CSV source completed in 0 seconds (0/sec) 2025-06-16 10:32:01.077 INFO [CSVImporterFormat] <ArcadeDB_0> - Parsed lines...: 0 2025-06-16 10:32:01.077 INFO [CSVImporterFormat] <ArcadeDB_0> - Total documents: 0 2025-06-16 10:32:01.077 INFO [SourceDiscovery] <ArcadeDB_0> Analyzing url: file:///home/arcadedb/bug/edges.csv... 2025-06-16 10:32:01.079 INFO [CSVImporterFormat] <ArcadeDB_0> Reading header from 1st line in data file: [Tenant, Supervisor] 2025-06-16 10:32:01.080 INFO [SourceDiscovery] <ArcadeDB_0> Recognized format CSV (parsingLimitBytes=9.54MB parsingLimitEntries=0) 2025-06-16 10:32:01.080 INFO [Importer] <ArcadeDB_0> Checking schema... 2025-06-16 10:32:01.083 INFO [CSVImporterFormat] <ArcadeDB_0> Started importing edges from CSV source (expectedVertices=4 expectedEdges=8) 2025-06-16 10:32:01.084 INFO [CSVImporterFormat] <ArcadeDB_0> Importing the following edge properties: [Tenant, Supervisor] 2025-06-16 10:32:01.387 INFO [CSVImporterFormat] <ArcadeDB_0> Importing of edges from CSV source completed in 0 seconds (0/sec) 2025-06-16 10:32:01.387 INFO [CSVImporterFormat] <ArcadeDB_0> - Parsed lines......: 7 2025-06-16 10:32:01.387 INFO [CSVImporterFormat] <ArcadeDB_0> - Total edges.......: 0 2025-06-16 10:32:01.387 INFO [CSVImporterFormat] <ArcadeDB_0> - Total linked Edges: 0 2025-06-16 10:32:01.387 INFO [CSVImporterFormat] <ArcadeDB_0> - Skipped edges.....: 6 2025-06-16 10:32:01.388 INFO [PostCommandHandler] <ArcadeDB_0> Error on command execution (PostCommandHandler): Error on importing database

ruispereira avatar Jun 16 '25 10:06 ruispereira

It is not working with 25.5.1 for me either. I assume the problem is that the last import command does not know which types are to be connected. So an idea would be to make a base type for tenant and supervisor with a property id, ie entity and pass vertexType=entity, typeIdProperty=id, and typeIdType=Long in the last import command again. I will try this later this afternoon. But feel free to try yourself.

gramian avatar Jun 16 '25 10:06 gramian

Not sure what you mean with a "base type for tenant and supervisor", please provide an example, so i can try in a minute.

ruispereira avatar Jun 16 '25 11:06 ruispereira

For example:

CREATE VERTEX TYPE Entity;
CREATE PROPERTY Entity.id LONG;

CREATE VERTEX TYPE Tenant EXTENDS Entity;
CREATE VERTEX TYPE Supervisor EXTENDS Entity;

gramian avatar Jun 16 '25 12:06 gramian

Hmm just tried it, it seems it does not work as the importet does not see derived properties.

gramian avatar Jun 16 '25 12:06 gramian

For example:

CREATE VERTEX TYPE Entity;
CREATE PROPERTY Entity.id LONG;

CREATE VERTEX TYPE Tenant EXTENDS Entity;
CREATE VERTEX TYPE Supervisor EXTENDS Entity;

ah ok, understood. with inheritance extending from a base type, got it.

ruispereira avatar Jun 16 '25 13:06 ruispereira

Hmm just tried it, it seems it does not work as the importet does not see derived properties.

humm :x

ruispereira avatar Jun 16 '25 13:06 ruispereira

Any update on this one?

ruispereira avatar Jun 26 '25 15:06 ruispereira