kgx
kgx copied to clipboard
Undocumented requirements for input nodes/edges file names?
Describe the bug
The neo4j-upload
CLI command fails to successfully upload my KGX-formatted json lines files and complains with the following:
[KGX][jsonl_source.py][ parse] WARNING: Parse function cannot resolve the KGX file type in name nodes-tiny.jsonl. Skipped...
[KGX][jsonl_source.py][ parse] WARNING: Parse function cannot resolve the KGX file type in name edges-tiny.jsonl. Skipped...
To Reproduce
You can reproduce by running the following command, where nodes-tiny.jsonl
and edges-tiny.jsonl
are any KGX-formatted nodes/edges json lines files (and you have Neo4j running on localhost).
kgx neo4j-upload --uri bolt://localhost:7687 --username neo4j --password [password] --input-format jsonl nodes-tiny.jsonl edges-tiny.jsonl
Expected behavior I would expect that command to upload my files to Neo4j successfully.
Additional context
I eventually figured out that if I tweak the names of my nodes/edges files so that they end with nodes.jsonl
and edges.jsonl
, then the command completes successfully. In other words, this command works normally (differs only in file names):
kgx neo4j-upload --uri bolt://localhost:7687 --username neo4j --password [password] --input-format jsonl tiny-nodes.jsonl tiny-edges.jsonl
I might have missed it, but I don't see this file naming requirement in the documentation. Could this requirement either be made looser (e.g., require that nodes
/edges
is anywhere in the file name, rather than at the end?), or be documented clearly somewhere?
(As a side note, I see that the KGX specification lists file names as nodes.jsonl
and edges.jsonl
, but it doesn't appear that that exact naming is actually expected in practice - examples in the kgx
package documentation use different file names, like test_nodes.jsonl
(here))