biomedical
biomedical copied to clipboard
_TAGS for fine-grained tasks
INTRO
This introduces a metadata attribute _TAGS
.
The values in _TAGS
are, well tags, to further classify the task.
These tags are meant to be used together w/ _SUPPORTED_TASKS
.
I tried to be as generic as possible in order to have a as minimal as possible set of tags, whose combination w/ _SUPPORTED_TASKS
could make sense.
For instance, bc5cdr
will have the following combination of fine-grained tasks:
- NER Chemical
- NER Disease
- NED Chemical
- NED Disease
- Relation Disease
- Relation Chemical
This is just a temporary solution and we can discuss which tags make sense and which do not.
HOW:
You can access these tags by checking out this PR:
gh pr checkout https://github.com/bigscience-workshop/biomedical/pull/703
As mentioned these tags are meant to be composed w/ _SUPPORTED_TASKS
:
configs = BigBioConifgHelpers()
for config in configs:
for task in config.tasks:
for tag in config._py_module._TAGS:
finegrined_task = f"{task} {tag}"
TROUBLESHOOTING:
The BigBioConfigHelper
should load just fine. If this is not the case, it is probably because of a spelling error.
To fix this you just need to edit the file you find in
bigbio/utils/resources/tags.json
If you find such errors, please ping me on slack and I will fix them right away.
The idea moving forward would be to attach specific tags to specific taks, this way we can have a test for this information, e.g. the "MULTIPLE_CHOICE" tag should be available only for "QA".
One more thing: during the process I was tempted to create a SOCIAL_MEDIA
and CLINICAL
tag, but I think we should have yet another metadata attribute specific only for "domain"/"source".