nusa-crowd icon indicating copy to clipboard operation
nusa-crowd copied to clipboard

Closes #227 Data loader for Karonese sentiment

Open aliakbars opened this issue 3 years ago • 5 comments

Please name your PR after the issue it closes. You can use the following line: "Closes #ISSUE-NUMBER" where you replace the ISSUE-NUMBER with the one corresponding to your dataset.

Checkbox

  • [x] Confirm that this PR is linked to the dataset issue.
  • [x] Create the dataloader script nusantara/nusa_datasets/my_dataset/my_dataset.py (please use only lowercase and underscore for dataset naming).
  • [x] Provide values for the _CITATION, _DATASETNAME, _DESCRIPTION, _HOMEPAGE, _LICENSE, _URLs, _SUPPORTED_TASKS, _SOURCE_VERSION, and _NUSANTARA_VERSION variables.
  • [x] Implement _info(), _split_generators() and _generate_examples() in dataloader script.
  • [x] Make sure that the BUILDER_CONFIGS class attribute is a list with at least one NusantaraConfig for the source schema and one for a nusantara schema.
  • [x] Confirm dataloader script works with datasets.load_dataset function.
  • [x] Confirm that your dataloader script passes the test suite run with python -m tests.test_nusantara --path=nusantara/nusa_datasets/my_dataset/my_dataset.py.
  • [ ] If my dataset is local, I have provided an output of the unit-tests in the PR (please copy paste). This is OPTIONAL for public datasets, as we can test these without access to the data files.

aliakbars avatar Aug 29 '22 19:08 aliakbars

This dataset is a bit noisy at the moment, aside from having inconsistent labeling (numeric vs string), some data has no labels at all. I've sent a PR to that dataset https://github.com/imkarokaro123/karonese/pull/1 in which aside from cleaning the data, also add extra username masking to add some privacy.

afaji avatar Aug 30 '22 05:08 afaji

@afaji : So, let's just use the one from your fork and move forward with the PR, shall we?

SamuelCahyawijaya avatar Sep 13 '22 03:09 SamuelCahyawijaya

Waiting for this PR to be approved https://github.com/imkarokaro123/karonese/pull/3

aliakbars avatar Sep 22 '22 18:09 aliakbars

Hi @aliakbars : Perhaps we can just use the data from your fork for now, since we couldn't get any update from the author of the dataset

SamuelCahyawijaya avatar Oct 08 '22 12:10 SamuelCahyawijaya

Updated. Should be working properly now, @SamuelCahyawijaya @holylovenia @muhsatrio.

aliakbars avatar Oct 08 '22 13:10 aliakbars

/test dataset=karonese_sentiment

SamuelCahyawijaya avatar Oct 22 '22 07:10 SamuelCahyawijaya