nusa-crowd icon indicating copy to clipboard operation
nusa-crowd copied to clipboard

Create dataset loader for IndoCollex

Open SamuelCahyawijaya opened this issue 3 years ago • 9 comments

https://indonlp.github.io/nusa-catalogue/card.html?indocollex

SamuelCahyawijaya avatar Jul 10 '22 08:07 SamuelCahyawijaya

#self-assign

haryoa avatar Jul 21 '22 15:07 haryoa

Hi @haryoa, are you still working on this? I will assume inactivity if there's no reply and will free the assignees. Thanks!

bryanwilie avatar Sep 11 '22 07:09 bryanwilie

Cleared assignees due to inactivity. Issue is still open for contribution!

bryanwilie avatar Sep 16 '22 04:09 bryanwilie

This dataset is morphology-related task, based on #44, there is no supported nusantara task for now. However, for this dataset, Nusantara Text Pairs Schema could support this as this dataset splits multiple inflections. i.e.: A is not in the dataset (while B and C are) as illustrated follows:

# Text1 Text2 Transformation
A teman-teman temen2 space-dash, sound-alter
B teman-teman teman2 space-dash
C teman2 temen2 sound-alter

Shall I implement this dataset with PAIRS and source schema? tag: @holylovenia

fhudi avatar Sep 18 '22 14:09 fhudi

#self-assign

fhudi avatar Sep 18 '22 14:09 fhudi

Tagging @holylovenia since she might missed this

bryanwilie avatar Oct 03 '22 04:10 bryanwilie

Hi @fhudi, sorry for the long wait, I missed this one before. I think your idea of using the pairs schema is great. Give me a minute to add a little tweak to the config.

holylovenia avatar Oct 03 '22 04:10 holylovenia

The pairs_multi schema and Tasks.MORPHOLOGICAL_INFLECTION are ready to use, @fhudi! Also, this decision change also calls for a modification in #44 and #156. Would you mind incorporating the proper changes to these dataloaders? Maybe in a new PR? That way it can count as an extra contribution for your Hacktoberfest milestone in case you're joining. I can create an issue for this fix later. No pressure if you prefer not to, though. 😄

Thanks again for your wonderful suggestion!

PS: Also thanks to @bryanwilie for the kind reminder and help.

holylovenia avatar Oct 03 '22 05:10 holylovenia

@holylovenia Surebeans, I will do the modification 😄

fhudi avatar Oct 09 '22 08:10 fhudi