ExplainaBoard icon indicating copy to clipboard operation
ExplainaBoard copied to clipboard

Centralized type registry

Open odashi opened this issue 2 years ago • 7 comments

Currently we have several registry instances in the repository. Some of them are implemented as TypeRegistry that require objects not to self-serialize inner objects, and it can't serialize nested objects that are registered in different registries.

I think introducing a unified type registry that covers all types could be the solution of this problem, and it also avoids multiple registry definitions from this repository. One caveat over this solution is name collision: registering multiple types with the same name, which results in runtime error. I believe there are no classes that shares the same name, but we couldn't guarantee it in the future.

Since TypeRegistry supports any string to register a class, I think we can introduce some naming convention to the registered names. For example:

  • FooMetricConfg -> core.metric_configs.Foo
  • FooAnalyzer -> core.analyzers.Foo

Here I also added the prefix core, which indicates that the class is in this repository. It is intended to avoid potential name collisions between this repository and third-parties.

RFC: @neubig @pfliu-nlp @tetsuok how do you think about it?

odashi avatar Sep 03 '22 06:09 odashi

The proposed solution looks good to me. I think we need to document the naming convention so that the convention is easy to understand and follow. If we can set up static analysis or some sort of unit tests to automatically check the convention in pull requests, that's best.

tetsuok avatar Sep 05 '22 04:09 tetsuok

Still RFC: @neubig @pfliu-nlp

odashi avatar Sep 05 '22 06:09 odashi

That basically seems fine to me.

One alternative to core would be explainaboard, as that would be the thing that they would actually write in an import statement. It's a bit more verbose though, so I don't have a strong preference either way.

neubig avatar Sep 05 '22 10:09 neubig

"Currently we have several registry instances in the repository. Some of them are implemented as TypeRegistry that require objects not to self-serialize inner objects, and it can't serialize nested objects that are registered in different registries."

True. I also realized this. We actually have some nested object to be serialized, which happens in different Feature objects. For example, here is some code for the serialization of nested object: https://github.com/neulab/ExplainaBoard/blob/main/explainaboard/analysis/feature.py

"Here I also added the prefix core, which indicates that the class is in this repository. It is intended to avoid potential name collisions between this repository and third-parties."

Looks good to me.

pfliu-nlp avatar Sep 05 '22 13:09 pfliu-nlp

I'm planning to apply this change after the all manual serialization was replaced by the serialization subpackage. #419

odashi avatar Sep 10 '22 00:09 odashi

Remaining tasks:

The ultimate goal of this is to serialize SysOutputInfo by the serialization subpackage. It requires to implement the Serializable interface for following classes:

  • All classes in analyses package
  • BucketPerformance

odashi avatar Oct 14 '22 10:10 odashi

Sounds great, thanks for the update.

neubig avatar Oct 14 '22 12:10 neubig