nusa-crowd Create dataset loader for IndoCollex

https://indonlp.github.io/nusa-catalogue/card.html?indocollex

Jul 10 '22 08:07 SamuelCahyawijaya

#self-assign

Jul 21 '22 15:07 haryoa

Hi @haryoa, are you still working on this? I will assume inactivity if there's no reply and will free the assignees. Thanks!

Sep 11 '22 07:09 bryanwilie

Cleared assignees due to inactivity. Issue is still open for contribution!

Sep 16 '22 04:09 bryanwilie

This dataset is morphology-related task, based on #44, there is no supported nusantara task for now. However, for this dataset, Nusantara Text Pairs Schema could support this as this dataset splits multiple inflections. i.e.: A is not in the dataset (while B and C are) as illustrated follows:

#	Text1	Text2	Transformation
A	teman-teman	temen2	space-dash, sound-alter
B	teman-teman	teman2	space-dash
C	teman2	temen2	sound-alter

Shall I implement this dataset with PAIRS and source schema? tag: @holylovenia

Sep 18 '22 14:09 fhudi

#self-assign

Sep 18 '22 14:09 fhudi

Tagging @holylovenia since she might missed this

Oct 03 '22 04:10 bryanwilie

Hi @fhudi, sorry for the long wait, I missed this one before. I think your idea of using the pairs schema is great. Give me a minute to add a little tweak to the config.

Oct 03 '22 04:10 holylovenia

The pairs_multi schema and Tasks.MORPHOLOGICAL_INFLECTION are ready to use, @fhudi! Also, this decision change also calls for a modification in #44 and #156. Would you mind incorporating the proper changes to these dataloaders? Maybe in a new PR? That way it can count as an extra contribution for your Hacktoberfest milestone in case you're joining. I can create an issue for this fix later. No pressure if you prefer not to, though. 😄

Thanks again for your wonderful suggestion!

PS: Also thanks to @bryanwilie for the kind reminder and help.

Oct 03 '22 05:10 holylovenia

@holylovenia Surebeans, I will do the modification 😄

Oct 09 '22 08:10 fhudi