grover
grover copied to clipboard
Imbalanced validation and test data
Section 5.1 has this line: "We split the articles in a balanced way, with 10k for training (5k per label), 2k for validation, and 8k for testing."
But the "generator=mega~dataset=p0.94.jsonl" file at https://github.com/rowanz/grover/blob/ea12b0ef8805e1cc83fd2d578af742ba599c79f0/generation_examples/README.md has imbalanced validation and test sets: train-human = 5K train-machine = 5K validation-human = 2K validation-machine = 1K test-human = 8K test-machine = 4K
Why the validation and test sets are imbalanced? Thanks!