Graham Neubig comments

Results 121 comments of


Graham Neubig

Better support for listing/inspecting DataLab datasets

This directory has a list of all datasets https://github.com/ExpressAI/DataLab/tree/main/datasets But more information should be available. See this issue opened upstream: https://github.com/ExpressAI/DataLab/issues/135

Better support for listing/inspecting DataLab datasets

This is now available here: https://github.com/ExpressAI/DataLab/blob/main/utils/datasets_info.json This still needs to be documented better in the ExplainaBoard at the very least.

Unittest errors due to file downloading

I'm fine with S3.

ExplainaBoard maintains the hyper-parameter features?

I think this is a good idea. My suggestion would be that we have something simple like: "system_details": {...} where the system details are left completely underspecified for now (other...

Mentioned dependency on poppler

@jlfu @pfliu-nlp : OK to merge this?

Can this be used on a custom dataset?

Yes. This example from the MasakhaNER dataset may be useful to help you get started: https://github.com/masakhane-io/masakhane-ner/tree/main/analysis_scripts We're currently in the process of doing major refactoring to make functionality like this...

Multiple references

I think this is a reasonable idea, but because it's a relatively minor use case (most major datasets like WMT, IWSLT, WAT have only one reference), I'd say it's maybe...

Sync with Overleaf

I also don't know, it's not me.

How to change the grammar file for another language

You'll want to try out tranX: https://github.com/pcyin/tranX It has better accuracy, is currently supported, and natively supports adding new languages.

Question: Translate from Java to Java

Theoretically this would be possible: you would have to use a tokenized Java string instead of a natural language string on the input side, and on the output side you...