clang8 icon indicating copy to clipboard operation
clang8 copied to clipboard

Data Format For GEC

Open saramoeini20 opened this issue 1 year ago • 0 comments

Hi, I'm working in GEC for a low resource language and wanted to create datasets myself. I have some question if you can answer i will be thankful.

  1. I saw training data is in parallel file format. So Should evaluating data be in M2 format? And M2 format is just for evaluating in GEC?

  2. If i want to create feedback on error or show the location of the error in GEC, is parallel file format still usable or i should change the format?

  3. And what approach you suggest for training model for a low resource language? Can i get help from your model in paper "A Simple Recipe for Multilingual Grammatical Error Correction"?

saramoeini20 avatar May 20 '23 16:05 saramoeini20