ExplainaBoard
ExplainaBoard copied to clipboard
Plugin feature and related refactoring of the whole structure
Background
This repository hosts all task/metric definitions that the ExplainaBoard handles, and it seems we face several problems to maintain the current development manner:
- The size of the codebase will become unnecessarily large for most users that are usually interested in only a few tasks.
- CI will become costful: test is too slow and requires a bunch of network connection. It already takes several minutes nonetheless we have only a few dozen of task definitions.
- Every contributor is required to propose pull requests to this repository to add a new task, or prepare a fork of the repository. We also have to review all proposals that we are not always interested in. This is not practical in case of increasing the community.
Since the objective of the ExplainaBoard is to host as many tasks as possible, the problems above will become more serious by increasing the number of tasks (and metrics).
Proposals
Introducing plugin feature
Tt's time to consider introducing the plugin feature. Python has an ability to list packages programatically and some libraries (e.g., flask, pytest) utilizes it to import additional functionalities from other package.
We can use this functionality to define the "extensible" parts onto separate repositories. This brings us several advantages in terms of maintenance:
- Achieves better separation of interests: the "core" library can focus on changes for only core parts, and the "plugin" can focus on extensible parts, such as one task.
- Tests are also separated to other repository with appropriate granularity.
Changing the structure around "task"
To achieve this change, we also need to standardize the definition of "task". The current repository has the Task
class, but it holds only a description associated to a name and there are no any relation to other part of the repository. Most information related to the "task" is actually categorized by TaskType
, and their definition is distributed to multiple parts of the repository (e.g., multiple registries that takes TaskType
). If the Task
class represent enough information of the "task", we can consolidate these definitions. Specifically, I am considering to change the structure around the task from:
tt = TaskType.foobar
loader = get_loader(tt)
processor = get_processor(tt)
to:
task = get_task("foobar")
loader = task.loader
processor = task.processor
Here I also removed TaskType
since managing the list of tasks as Enum has issues for extensibility (enums have strict typing and should not be used as a variable collection).
Avoid registries
The current repository relies on several implementation of global registry. Registry basically involves many technical disadvantages and should be avoided unless it is really required:
- Since registry can be modified by any place that can import the library, they could easily become an SPoF on the codebase.
- It makes the code flow hard to comprehend because the flow is implicitly determined by the import order.
- Especially in Python it provides no opportunity to clean-up the registered objects even if they are related to some external resources that have to be disposed correctly.
- It drops typing information so we couldn't get a better support from the editor.
In most cases, registry is not necessary to achieve the same behavior. There are mainly two use-cases of the registry on this repository below:
- If we need to use a specific feature, we can import and instantiate it directly. This keeps the correct typing information too.
- If we need to collect the functionalities that are implemented on the same interface, we can introduce a common syntax to notify the list of functionalities.
For example, we can introduce following syntax to
__init__.py
of the plugin package:
and we can collect these definitions programatically:# foo_task.py from explainaboard import Task class FooTask(Task): def name(cls): return "Foo" ... # __init__.py from a_plugin import FooTask, BarTask # List of tasks that are exported from this package. TASKS = [FooTask, BarTask]
all_tasks = {} for plugin in locate_all_plugins(): # import plugins one-by-one for task in get_tasks(plugin): # yield from __init__.TASKS all_tasks[task.name()] = task
@neubig @pfliu-nlp RFC
Hi, @odashi thanks for the proposal! Overall they look very nice both in terms of code maintenance and interoperability. Just sharing some comments:
Regarding plugin feature
"Achieves better separation of interests: the "core" library can focus on changes for only core parts, and the "plugin" can focus on extensible parts, such as one task."
If we define the separation
by task, wouldn't we suffer from the following 2nd and 3rd issues that you have listed at the beginning of this proposal?
Changing the structure around "task"
I agree it would be nice if we could define a task as a class with member functions such as loaders and processors.
(Then finally, we will have several powerful and well-defined classes: datasets
, tasks
, metrics
, which almost define NLP.
Avoid registries
I'm fine with better alternatives to the registries.
@pfliu-nlp Yes, we would need to work for 2, 3, then 1.
datasets, tasks, metrics, which almost define NLP.
This is a good point, but we also need to keep in mind that the "task" in this repository may confuse the users because it is not the actual process of the task itself (e.g., translating source to target in machine translation tasks).
Maybe we also need to determine "protocols" (shared data format) between datasets, tasks, and metrics, to achieve better separation.
For example,
- The dataset defines "available datatype" $D_a$ that the dataset can provide:
class FooDataset(Dataset):
def get_available_datatype(cls):
return [("ref", list[list[str]]), ("hyp", list[str])] # list of (name, type)
- The task defines "required datatype" $D_r$ that the task requires:
class BarTask(Task):
def get_required_datatype(cls):
return [("ref", list[list[str]]), ("hyp", list[str])] # list of (name, type)
- The main routine compares both datatypes (
explainaboard --dataset Foo --task Bar
。If $D_r \subset D_a$ the program determines they can be connected.
This approach can mitigate a strict tying between the dataset and the task, and could allow users to add a new dataset/task without knowing more than what the "datatype" provided. Maybe some similar approach can be introduced between task and metric.
I'm looking through old issues in github, and just wanted to say that I like this direction a lot and I think we're moving towards it gradually. A more strict definition of "required datatypes" would also be useful in other places of the code as well, like when we define feature functions, etc.