unstructured icon indicating copy to clipboard operation
unstructured copied to clipboard

TypeError: ElementMetadata.__init__() got an unexpected keyword argument 'key3'

Open shiralkarprashant opened this issue 5 months ago • 1 comments

In trying to load a JSON file (structured as below) with a call to elements = partition(filename=f), I get the error message in the title.

[
    {
        "key1": "val1",
        "key2":
        [
            "val2"
        ],
        "metadata":
        {
            "key3": "val3",
        }
    }
]

Upon digging a bit into the Unstructured code, I figured that while the JSON file loads fine, the conversion of the loaded dict to elements fails at the line below, because the code is parsing the 'metadata' in the input file as metadata about the document, but in fact this element refers to my use case specific metadata which I'd like to keep as part of the document text. So perhaps this looks like a conflict. Is there a way to avoid this in the unstructured library?

https://github.com/Unstructured-IO/unstructured/blob/5defe79bf24d503b8ad6ed6de1a69f20c7cec47b/unstructured/staging/base.py#L134

shiralkarprashant avatar Feb 07 '24 12:02 shiralkarprashant

@shiralkarprashant - Could you sent the code block to reproduce the error? I gave partition a try with the example dictionary and got [] as the output instead of the error. Either way, [] is probably not what we want here, and we can fall back to processing as text until we have a more sophisticated JSON parser.

MthwRobinson avatar Feb 21 '24 18:02 MthwRobinson