croissant
croissant copied to clipboard
JSON LD parsing fails when multiple strings given to FileSet.includes
FileSet(..., includes=["xyz"])
works but FileSet(..., includes=["xyz", "wxy"])
does not work, failing with:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "REDACTED/util/croissant.py", line 154, in <module>
test()
File "REDACTED/util/croissant.py", line 141, in test
dataset = mlc.Dataset(jsonld="croissant.json")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<string>", line 6, in __init__
File "REDACTED/mlcroissant/_src/datasets.py", line 70, in __post_init__
self.metadata = Metadata.from_file(ctx=ctx, file=self.jsonld)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "REDACTED/mlcroissant/_src/structure_graph/nodes/metadata.py", line 429, in from_file
return cls.from_json(ctx=ctx, json_=json_)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "REDACTED/mlcroissant/_src/structure_graph/nodes/metadata.py", line 439, in from_json
jsonld = expand_jsonld(json_, ctx=ctx)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "REDACTED/mlcroissant/_src/core/json_ld.py", line 231, in expand_jsonld
recursively_populate_jsonld(entry_node, id_to_node)
File "REDACTED/mlcroissant/_src/core/json_ld.py", line 171, in recursively_populate_jsonld
value = [recursively_populate_jsonld(child, id_to_node) for child in value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "REDACTED/mlcroissant/_src/core/json_ld.py", line 161, in recursively_populate_jsonld
return recursively_populate_jsonld(entry_node, id_to_node)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "REDACTED/mlcroissant/_src/core/json_ld.py", line 171, in recursively_populate_jsonld
value = [recursively_populate_jsonld(child, id_to_node) for child in value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "REDACTED/mlcroissant/_src/core/json_ld.py", line 164, in recursively_populate_jsonld
for key, value in entry_node.copy().items():
^^^^^^^^^^^^^^^
AttributeError: 'str' object has no attribute 'copy'
presumably due to
https://github.com/mlcommons/croissant/blob/0f95e04763557929e4f4c6711c108c0d9cf7b818/python/mlcroissant/mlcroissant/_src/core/json_ld.py#L169
trying to recurse into the array of strings by running recursively_populate_jsonld
on str
s when it seems to be intended for dict
s only. One simple fix would be to exit early from recusrively_populate_jsonld
is entry_node
is not a dict
, but maybe this is just masking a deeper problem.
If you want me to open a PR or something, let me know. Thanks.