Appraise
Appraise copied to clipboard
Compare unique items not system outputs
Many times, the systems outputs for a sentence are identical. Rather than constructing each task from a random subset of systems, each task should be constructed from the set of distinct outputs for that sentence. The pairwise rankings could then be re-associated with the systems to generate a larger set of pairwise rankings.
This would be a bit more respectful of people's times (it's annoying to see identical outputs), and would also let us potentially gather data more quickly. On the WMT14 data, for example, there are identical system outputs on over half the sentences.
CC: @cfedermann
This will be fixed for WMT15
I found a workaround for this, that creates entries where the system name is a comma-delimited list of systems. You then just have to split those out and compile out the (often much larger) set of rankings. If you want what I've done, let me know. That might be a better way than trying to do it internally.
@mjpost elegant solution; I didn't plan to do this deduping internally. Data will be rendered as-is.
Can you point me to your code for this?
Commit 00131810b5 addresses this during batch generation...
@mjpost I'm preparing sample files next; it would be nice if you could have a quick look when you get a chance...
Yes, please send them. I likely won't have time till later in the day but will prioritize it.
Aloha @mjpost, mini batches are inside the repo (new wmt15data folder). They look good to me but I'm a little worn out by now ;)
Any feedback you might have is very welcome.
Cheers and best, Christian
-----Original Message----- From: "Matt Post" [email protected] Sent: 5/7/2015 5:15 AM To: "cfedermann/Appraise" [email protected] Cc: "Christian Federmann" [email protected] Subject: Re: [Appraise] Compare unique items not system outputs (#45)
Yes, please send them. I likely won't have time till later in the day but will prioritize it. — Reply to this email directly or view it on GitHub.
@mjpost Batch 1 files have been added (wmt15data/full-batches folder).
I have checked that exporting data for "multi systems" generates the right CSV format, possibly spanning more than a single line. I add one or more PLACEHOLDER systems to make sure we end up with five systems per row. The corresponding rank is -1, so this does not have an effect on scoring. Will verify that soon...
Let me know if you spot any issues with the data.
Thanks. Kann ich einen invite token?