djl New Bert example using Goemotions

Can we have a new Bert example using Goemotions? https://github.com/google-research/google-research/tree/master/goemotions

Colab or Binder notebook would be awesome for Multi-label classification example including training and inference .

Oct 05 '20 12:10 pmg1991

@pmg1991 This is indeed a nice example for BERT. We are trying to adding more NLP examples. We will priority this request.

At the mean time, are you interested in contributing by adding this dataset to DJL?

Oct 06 '20 21:10 frankfliu

@frankfliu Sure , I'd like to contribute.

Oct 07 '20 10:10 pmg1991

@pmg1991 I create a CsvDataset in https://github.com/awslabs/djl/pull/208

You should be able to extends CsvDataset and create a Goemotions, you can use https://github.com/awslabs/djl/blob/master/basicdataset/src/main/java/ai/djl/basicdataset/AmesRandomAccess.java as an example.

Oct 14 '20 02:10 frankfliu

Hi @zachgk, I'm interested in this issue and I want to work on it, so I wonder if you can assign it to me? Thanks!

Apr 17 '22 12:04 Konata-CG

Yeah, here you go @Konata-CG

Apr 17 '22 16:04 zachgk

I found this dataset contains several raw datasets and processed datasets. They described the processed datasets as below: "The data we used for training the models includes examples where there is an agreement between at least 2 raters. Our data includes 43,410 training examples (train.tsv), 5426 dev examples (dev.tsv) and 5427 test examples (test.tsv)." I wonder which datasets should I use? raw or processed.

Apr 19 '22 05:04 Konata-CG

I would recommend the processed data. One of the big problems when working with datasets is that the data is often very noisy. In this example, one source of noise would be that some examples can't be clearly classified to an emotion. So, the processed one where they remove the data that isn't suitable for the task saves everyone who uses your Dataset class from having to do the same processing themselves.

Then, you would have the three train/validate/test .tsv files to correspond to the different DJL dataset Usages

Apr 20 '22 00:04 zachgk

djl djl copied to clipboard

New Bert example using Goemotions

djl
djl copied to clipboard