Create dataset loader for IndoCollex
https://indonlp.github.io/nusa-catalogue/card.html?indocollex
#self-assign
Hi @haryoa, are you still working on this? I will assume inactivity if there's no reply and will free the assignees. Thanks!
Cleared assignees due to inactivity. Issue is still open for contribution!
This dataset is morphology-related task, based on #44, there is no supported nusantara task for now.
However, for this dataset, Nusantara Text Pairs Schema could support this as this dataset splits multiple inflections.
i.e.: A is not in the dataset (while B and C are) as illustrated follows:
| # | Text1 | Text2 | Transformation |
|---|---|---|---|
| A | teman-teman | temen2 | space-dash, sound-alter |
| B | teman-teman | teman2 | space-dash |
| C | teman2 | temen2 | sound-alter |
Shall I implement this dataset with PAIRS and source schema?
tag: @holylovenia
#self-assign
Tagging @holylovenia since she might missed this
Hi @fhudi, sorry for the long wait, I missed this one before. I think your idea of using the pairs schema is great. Give me a minute to add a little tweak to the config.
The pairs_multi schema and Tasks.MORPHOLOGICAL_INFLECTION are ready to use, @fhudi!
Also, this decision change also calls for a modification in #44 and #156. Would you mind incorporating the proper changes to these dataloaders? Maybe in a new PR? That way it can count as an extra contribution for your Hacktoberfest milestone in case you're joining. I can create an issue for this fix later. No pressure if you prefer not to, though. 😄
Thanks again for your wonderful suggestion!
PS: Also thanks to @bryanwilie for the kind reminder and help.
@holylovenia Surebeans, I will do the modification 😄