grover icon indicating copy to clipboard operation
grover copied to clipboard

Imbalanced validation and test data

Open ganeshjawahar opened this issue 4 years ago • 0 comments

Section 5.1 has this line: "We split the articles in a balanced way, with 10k for training (5k per label), 2k for validation, and 8k for testing."

But the "generator=mega~dataset=p0.94.jsonl" file at https://github.com/rowanz/grover/blob/ea12b0ef8805e1cc83fd2d578af742ba599c79f0/generation_examples/README.md has imbalanced validation and test sets: train-human = 5K train-machine = 5K validation-human = 2K validation-machine = 1K test-human = 8K test-machine = 4K

Why the validation and test sets are imbalanced? Thanks!

ganeshjawahar avatar Jun 07 '20 06:06 ganeshjawahar