Appraise Compare unique items not system outputs

Many times, the systems outputs for a sentence are identical. Rather than constructing each task from a random subset of systems, each task should be constructed from the set of distinct outputs for that sentence. The pairwise rankings could then be re-associated with the systems to generate a larger set of pairwise rankings.

This would be a bit more respectful of people's times (it's annoying to see identical outputs), and would also let us potentially gather data more quickly. On the WMT14 data, for example, there are identical system outputs on over half the sentences.

CC: @cfedermann

Apr 12 '15 13:04 mjpost

This will be fixed for WMT15

May 05 '15 16:05 cfedermann

I found a workaround for this, that creates entries where the system name is a comma-delimited list of systems. You then just have to split those out and compile out the (often much larger) set of rankings. If you want what I've done, let me know. That might be a better way than trying to do it internally.

May 05 '15 16:05 mjpost

@mjpost elegant solution; I didn't plan to do this deduping internally. Data will be rendered as-is.

Can you point me to your code for this?

May 05 '15 16:05 cfedermann

Commit 00131810b5 addresses this during batch generation...

May 07 '15 09:05 cfedermann

@mjpost I'm preparing sample files next; it would be nice if you could have a quick look when you get a chance...

May 07 '15 09:05 cfedermann

Yes, please send them. I likely won't have time till later in the day but will prioritize it.

May 07 '15 12:05 mjpost

Aloha @mjpost, mini batches are inside the repo (new wmt15data folder). They look good to me but I'm a little worn out by now ;)

Any feedback you might have is very welcome.

Cheers and best, Christian

-----Original Message----- From: "Matt Post" [email protected] Sent: ‎5/‎7/‎2015 5:15 AM To: "cfedermann/Appraise" [email protected] Cc: "Christian Federmann" [email protected] Subject: Re: [Appraise] Compare unique items not system outputs (#45)

Yes, please send them. I likely won't have time till later in the day but will prioritize it. — Reply to this email directly or view it on GitHub.

May 07 '15 12:05 cfedermann

@mjpost Batch 1 files have been added (wmt15data/full-batches folder).

I have checked that exporting data for "multi systems" generates the right CSV format, possibly spanning more than a single line. I add one or more PLACEHOLDER systems to make sure we end up with five systems per row. The corresponding rank is -1, so this does not have an effect on scoring. Will verify that soon...

Let me know if you spot any issues with the data.

May 08 '15 08:05 cfedermann

Thanks. Kann ich einen invite token?

May 12 '15 02:05 mjpost

Appraise Appraise copied to clipboard

Compare unique items not system outputs

Appraise
Appraise copied to clipboard