unitxt
unitxt copied to clipboard
Artifact.from_dict recursively interprets fields that it shouldn't
Artifact.from_dict will try to convert any dictionary to artifact even when the dictionary is a mapping of columns from datasets.
There cannot be a column named type:
import tempfile
import pandas as pd
from unitxt.artifact import Artifact
with tempfile.TemporaryDirectory() as tmpdir:
filename = f"{tmpdir}/bug.csv"
df = pd.DataFrame([{"type": f"test_{i}", "row": i} for i in range(10)])
df.to_csv(filename, index=False)
loader = Artifact.from_dict(
{
"type": "sequential_recipe",
"steps": [
{"type": "load_csv", "files": {"test": filename}},
{"type": "rename_fields", "field_to_field": {"row": "myrow", "type": "bug_found"}},
],
}
)
print(list(loader()["test"]))
This will result in the following exception:
File "/Users/radek/Library/Caches/pypoetry/virtualenvs/fmaas-eval-_3vZ4Wue-py3.11/lib/python3.11/site-packages/unitxt/artifact.py", line 214, in _recursive_load
cls.verify_artifact_dict(obj)
File "/Users/radek/Library/Caches/pypoetry/virtualenvs/fmaas-eval-_3vZ4Wue-py3.11/lib/python3.11/site-packages/unitxt/artifact.py", line 149, in verify_artifact_dict
raise UnrecognizedArtifactTypeError(d["type"])
unitxt.artifact.UnrecognizedArtifactTypeError: 'bug_found' is not a recognized artifact 'type'. Make sure a the class defined this type (Probably called 'BugFound' or similar) is defined and/or imported anywhere in the code executed.
The code works when you remove "type": "bug_found" from field_to_field.
Yes. Type is currently a reserved name. Maybe we need to change it to __ type __ ( @elronbandel )?
I thought we did it already. of course we should. We can also define in the recursion not to get into dictionary without __type__