Graham Neubig

Results 121 comments of Graham Neubig

This directory has a list of all datasets https://github.com/ExpressAI/DataLab/tree/main/datasets But more information should be available. See this issue opened upstream: https://github.com/ExpressAI/DataLab/issues/135

This is now available here: https://github.com/ExpressAI/DataLab/blob/main/utils/datasets_info.json This still needs to be documented better in the ExplainaBoard at the very least.

I think this is a good idea. My suggestion would be that we have something simple like: "system_details": {...} where the system details are left completely underspecified for now (other...

@jlfu @pfliu-nlp : OK to merge this?

Yes. This example from the MasakhaNER dataset may be useful to help you get started: https://github.com/masakhane-io/masakhane-ner/tree/main/analysis_scripts We're currently in the process of doing major refactoring to make functionality like this...

I think this is a reasonable idea, but because it's a relatively minor use case (most major datasets like WMT, IWSLT, WAT have only one reference), I'd say it's maybe...

I also don't know, it's not me.

You'll want to try out tranX: https://github.com/pcyin/tranX It has better accuracy, is currently supported, and natively supports adding new languages.

Theoretically this would be possible: you would have to use a tokenized Java string instead of a natural language string on the input side, and on the output side you...