For issue 1575: Eliminating Manual Class Registration in Unitxt, replaced by Import Paths
__type__ in catalog is expressed as a dict {module: module, name: class_name}, therefrom classes are instantiated through python's import utils.
This means that if a class c is defined in some file that sits in path p, and c is referenced in a catalog entry, and since by this PR, c is referenced through p, then this reference coerces the defining file p to stay in place in the file system. Same coercion that is induced by a line of code reading from p import c. If the defining file moves in the file system, the reference in the catalog should be updated, same as any import line (as above) should be updated.
Backward Compatibility:
(1) A utility, utils/prepare_all_artifacts.py is provided which transforms a given catalog to the new format, by running the set of prepare modules. Needs to be invoked once per project.
(2) Also if not converted to the new format, a given catalog in the old format (where __type__ has a string value, a snake case of the class name) can be read and worked with by the PR's code: the code translates the __type__ upon loading from the catalog to the dict format. This is effective for all __type__ that refer to unitxt classes (classes in the unitxt/src/unitxt directory). Yet to be developed: on-the-air translation of __type__ that refer to user-defined classes.
For: https://github.com/IBM/unitxt/issues/1575
(virtual310) dafna@LAPTOP-ICP8MAPV:~/workspaces/unitxt$ git diff main...json --name-only | grep -v /catalog/
.github/workflows/catalog_consistency.yml
docs/catalog.py
docs/conf.py
prepare/cards/mtrag.py
prepare/metrics/custom_f1.py
prepare/tasks/qa/tasks.py
src/unitxt/artifact.py
src/unitxt/catalog.py
src/unitxt/dataset_utils.py
src/unitxt/deprecation_utils.py
src/unitxt/register.py
src/unitxt/settings_utils.py
src/unitxt/text_utils.py
tests/library/test_artifact.py
tests/library/test_artifact_recovery.py
tests/library/test_artifact_registration.py
tests/library/test_catalogs.py
tests/library/test_function_operators.py
tests/library/test_recipe.py
tests/library/test_text_utils.py
utils/check_catalog_consistency.py
utils/prepare_all_artifacts.py
(virtual310) dafna@LENAHUVA:~/workspaces/unitxt$
What is the status of this PR? I think it's an important change.
@elronbandel , per Yoav's question, does the intro to this PR address your concern (that I am not sure I understand)?