kgx icon indicating copy to clipboard operation
kgx copied to clipboard

Undocumented requirements for input nodes/edges file names?

Open amykglen opened this issue 1 year ago • 0 comments

Describe the bug The neo4j-upload CLI command fails to successfully upload my KGX-formatted json lines files and complains with the following:

[KGX][jsonl_source.py][               parse] WARNING: Parse function cannot resolve the KGX file type in name nodes-tiny.jsonl. Skipped...
[KGX][jsonl_source.py][               parse] WARNING: Parse function cannot resolve the KGX file type in name edges-tiny.jsonl. Skipped...

To Reproduce You can reproduce by running the following command, where nodes-tiny.jsonl and edges-tiny.jsonl are any KGX-formatted nodes/edges json lines files (and you have Neo4j running on localhost).

kgx neo4j-upload --uri bolt://localhost:7687 --username neo4j --password [password] --input-format jsonl nodes-tiny.jsonl edges-tiny.jsonl

Expected behavior I would expect that command to upload my files to Neo4j successfully.

Additional context I eventually figured out that if I tweak the names of my nodes/edges files so that they end with nodes.jsonl and edges.jsonl, then the command completes successfully. In other words, this command works normally (differs only in file names):

kgx neo4j-upload --uri bolt://localhost:7687 --username neo4j --password [password] --input-format jsonl tiny-nodes.jsonl tiny-edges.jsonl

I might have missed it, but I don't see this file naming requirement in the documentation. Could this requirement either be made looser (e.g., require that nodes/edges is anywhere in the file name, rather than at the end?), or be documented clearly somewhere?

(As a side note, I see that the KGX specification lists file names as nodes.jsonl and edges.jsonl, but it doesn't appear that that exact naming is actually expected in practice - examples in the kgx package documentation use different file names, like test_nodes.jsonl (here))

amykglen avatar Oct 25 '23 20:10 amykglen