biomedical icon indicating copy to clipboard operation
biomedical copied to clipboard

_TAGS for fine-grained tasks

Open sg-wbi opened this issue 2 years ago • 2 comments

INTRO

This introduces a metadata attribute _TAGS.

The values in _TAGS are, well tags, to further classify the task.

These tags are meant to be used together w/ _SUPPORTED_TASKS.

I tried to be as generic as possible in order to have a as minimal as possible set of tags, whose combination w/ _SUPPORTED_TASKS could make sense.

For instance, bc5cdr will have the following combination of fine-grained tasks:

  • NER Chemical
  • NER Disease
  • NED Chemical
  • NED Disease
  • Relation Disease
  • Relation Chemical

This is just a temporary solution and we can discuss which tags make sense and which do not.

HOW:

You can access these tags by checking out this PR:

gh pr checkout https://github.com/bigscience-workshop/biomedical/pull/703

As mentioned these tags are meant to be composed w/ _SUPPORTED_TASKS:

configs = BigBioConifgHelpers()

for config in configs:
   for task in config.tasks:
       for tag in config._py_module._TAGS:
           finegrined_task = f"{task} {tag}"

TROUBLESHOOTING:

The BigBioConfigHelper should load just fine. If this is not the case, it is probably because of a spelling error.

To fix this you just need to edit the file you find in

bigbio/utils/resources/tags.json

If you find such errors, please ping me on slack and I will fix them right away.

sg-wbi avatar Jun 08 '22 14:06 sg-wbi

The idea moving forward would be to attach specific tags to specific taks, this way we can have a test for this information, e.g. the "MULTIPLE_CHOICE" tag should be available only for "QA".

sg-wbi avatar Jun 08 '22 14:06 sg-wbi

One more thing: during the process I was tempted to create a SOCIAL_MEDIA and CLINICAL tag, but I think we should have yet another metadata attribute specific only for "domain"/"source".

sg-wbi avatar Jun 08 '22 14:06 sg-wbi