Eliminating Manual Class Registration in Unitxt with Import Paths
Problem Statement
In Unitxt, every artifact in the catalog includes a __type__ field in its JSON representation. This field stores the class that was used to instantiate the artifact, which is necessary for loading it back into a Python instance.
Currently, Unitxt relies on a class registry that maps a prettified class name to its actual class. The __type__ field stores the prettified name, and when an artifact is loaded, this name is used to look up the original class in the registry.
However, this approach introduces several challenges:
- Manual Class Registration – Any class that might appear in the catalog must be registered in advance.
- Import Dependencies – Users must explicitly import all custom classes used in the catalog within any code accessing it. This can be difficult to debug and communicate to users.
- Ongoing Maintenance – Users frequently encounter this issue and must manually maintain the solution.
Proposed Solution
Instead of storing a prettified name, we propose changing the __type__ field to store:
- A full import path (e.g.,
"unitxt.loaders.LoadHF") for globally available classes. - A relative import path (e.g.,
".MyOperator") based on a registered folder.
By default, the current working directory will be automatically registered, making the system more intuitive for small projects running locally.
Benefits of the Proposed Change
- No More Manual Class Registration – Libraries using Unitxt will no longer need to register their classes manually.
- Improved Usability for Small Projects – Projects operating within a single working directory will work seamlessly using relative imports.
- Support for Larger Projects – Projects without a formal package structure can register their main directories and use relative imports.
This change will make Unitxt more user-friendly, reduce setup complexity, and improve error handling.
This issue is stale because it has been open for 30 days with no activity.
This issue was closed because it has been inactive for 14 days since being marked as stale.
This is still worked on .Right?
This issue is stale because it has been open for 30 days with no activity.
Still is still important.
This issue is stale because it has been open for 30 days with no activity.
Still worked on in #1713
@yoavkatz , the look of the catalog (as suggested in PR #1713) is evidently not backward compatible. the __type__ field is defined differently: module and class (PR) vs snake (thus far). @elronbandel identified this as a problem preventing from accepting. The PR contains a utility that runs all the prepare files to 'face-lift' the catalog. Utility that needs to be run once per project (over the prepare files of the project). So backward compatibility is resolved within minutes. Please share your view about this issue.
This issue is stale because it has been open for 30 days with no activity.
This issue was closed because it has been inactive for 14 days since being marked as stale.
This is still an important issue that people struggle with.
Hi @yoavkatz , I already prepared a PR to solve this, as to all unitxt classes.
The version uses a new unitxt catalog, where each __type__ is expressed as a dict of module and class.
However, being backward compatible, it can also live with the current unitxt catalog where each __type__ is expressed as snake_case of the relevant class (no reflection of any module). The way the PR finds the module is (mainly) a simple grep over the files under src/unitxt.
In other words: we can offer the users a version that does not invoke register_all_artifacts upfront (in the __init__), employing, instead, and only for the needed classes - a grep. No other change to the code or the catalog.
Will that be of interest to you until @elronbandel will decide what to do with the above mentioned full PR?
This issue is stale because it has been open for 30 days with no activity.
This issue was closed because it has been inactive for 14 days since being marked as stale.
This issue is stale because it has been open for 30 days with no activity.