Graham Neubig
Graham Neubig
This directory has a list of all datasets https://github.com/ExpressAI/DataLab/tree/main/datasets But more information should be available. See this issue opened upstream: https://github.com/ExpressAI/DataLab/issues/135
This is now available here: https://github.com/ExpressAI/DataLab/blob/main/utils/datasets_info.json This still needs to be documented better in the ExplainaBoard at the very least.
I'm fine with S3.
I think this is a good idea. My suggestion would be that we have something simple like: "system_details": {...} where the system details are left completely underspecified for now (other...
@jlfu @pfliu-nlp : OK to merge this?
Yes. This example from the MasakhaNER dataset may be useful to help you get started: https://github.com/masakhane-io/masakhane-ner/tree/main/analysis_scripts We're currently in the process of doing major refactoring to make functionality like this...
I think this is a reasonable idea, but because it's a relatively minor use case (most major datasets like WMT, IWSLT, WAT have only one reference), I'd say it's maybe...
I also don't know, it's not me.
You'll want to try out tranX: https://github.com/pcyin/tranX It has better accuracy, is currently supported, and natively supports adding new languages.
Theoretically this would be possible: you would have to use a tokenized Java string instead of a natural language string on the input side, and on the output side you...