ExplainaBoard
ExplainaBoard copied to clipboard
Centralized type registry
Currently we have several registry instances in the repository. Some of them are implemented as TypeRegistry
that require objects not to self-serialize inner objects, and it can't serialize nested objects that are registered in different registries.
I think introducing a unified type registry that covers all types could be the solution of this problem, and it also avoids multiple registry definitions from this repository. One caveat over this solution is name collision: registering multiple types with the same name, which results in runtime error. I believe there are no classes that shares the same name, but we couldn't guarantee it in the future.
Since TypeRegistry
supports any string to register a class, I think we can introduce some naming convention to the registered names. For example:
- FooMetricConfg ->
core.metric_configs.Foo
- FooAnalyzer ->
core.analyzers.Foo
Here I also added the prefix core
, which indicates that the class is in this repository. It is intended to avoid potential name collisions between this repository and third-parties.
RFC: @neubig @pfliu-nlp @tetsuok how do you think about it?
The proposed solution looks good to me. I think we need to document the naming convention so that the convention is easy to understand and follow. If we can set up static analysis or some sort of unit tests to automatically check the convention in pull requests, that's best.
Still RFC: @neubig @pfliu-nlp
That basically seems fine to me.
One alternative to core
would be explainaboard
, as that would be the thing that they would actually write in an import
statement. It's a bit more verbose though, so I don't have a strong preference either way.
"Currently we have several registry instances in the repository. Some of them are implemented as TypeRegistry that require objects not to self-serialize inner objects, and it can't serialize nested objects that are registered in different registries."
True. I also realized this. We actually have some nested object to be serialized, which happens in different Feature
objects.
For example, here is some code for the serialization of nested object: https://github.com/neulab/ExplainaBoard/blob/main/explainaboard/analysis/feature.py
"Here I also added the prefix core, which indicates that the class is in this repository. It is intended to avoid potential name collisions between this repository and third-parties."
Looks good to me.
I'm planning to apply this change after the all manual serialization was replaced by the serialization
subpackage. #419
Remaining tasks:
The ultimate goal of this is to serialize SysOutputInfo
by the serialization
subpackage. It requires to implement the Serializable
interface for following classes:
- All classes in
analyses
package - BucketPerformance
Sounds great, thanks for the update.