ExplainaBoard icon indicating copy to clipboard operation
ExplainaBoard copied to clipboard

Rethinking of TaskCategory

Open odashi opened this issue 2 years ago • 3 comments

The library has the TaskCategory struct to group similar tasks. As far as I saw there is no any functionality for TaskCategory in the library, and it is used only from the web frontend.

  1. If it is a UI-specific thing, it would be better to move the definition to explainaboard-web.
  2. If it is still useful to keep some category-ish information in this library, I'd recommend to use "tag"s associated to each Task, because nested structures tends to become hard to be maintained about its consistency (e.g., multimodal tasks, tags are also capable to associate a task to multiple groups).

odashi avatar Mar 20 '22 17:03 odashi

I think it's definitely a reasonable idea to rethink this.

The original idea behind the task categories is that they would be everything where inputs and outputs are basically the same format. For example:

text-to-text: summarization, machine translation, etc. text categorization: sentiment classification, topic classification text pair categorization: NLI, paraphrase detection ...

The reason why we have "tasks" underneath the "task categories" is because we might want different bucketing features for the different tasks. For example, sentiment classification might want to make use of a sentiment lexicon, and NLI might want to make use of features regarding a negation appears, etc.

I'm interested in the idea of "tag"s, but maybe we need to discuss exactly how they'd be implemented.

neubig avatar Mar 20 '22 18:03 neubig

they would be everything where inputs and outputs are basically the same format

Does this mean the task category will provide input/output schemas? I guessed these can also be represented as a property of the task itself.

Here I discussed several features of Task:

  • tags ... just a list of strings to group tasks to certain categories (e.g., machine-translation would have text-to-text, conditional-text-generation, multilingual)
  • input schemas ... a concrete definitions of data format used in the task. I think the Loader class is responsible for the same stuff, but the Task itself don't know the appropriate Loader in the current implementation (unless making an inquiry to the registry).

odashi avatar Mar 21 '22 10:03 odashi

Hey @odashi thanks for the comments.

Regarding: "input schemas ... a concrete definitions of data format used in the task."

We have done a similar thing in the DataLab, maybe we could think about how to unify the way we define tasks in explainaboard and datalab.

pfliu-nlp avatar Mar 21 '22 13:03 pfliu-nlp