verifiers
verifiers copied to clipboard
Smartly prepare `EnvGroup` dataset for eval
If I got this right, currently if you try to eval an EnvGroup with a --num-samples smaller than the EnvGroup's first env, the eval will run only on that first env.
backtrace:
- env_group.py#EnvGroup: it concats eval_datasets.
eval_dataset = concatenate_datasets(eval_datasets) if eval_datasets else None
- environment.py#Environment.get_eval_dataset: It uses
Dataset.select(range(n)). if n is smaller or same size as the first env's dataset, this will return a dataset with only examples from that env.
Additionally, I noticed the seed param is not being used anywhere in the codebase when calling get_dataset or get_eval_dataset, so it seems to effectively be always None.
def get_eval_dataset(self, n: int = -1, seed: int | None = None) -> Dataset:
if self.eval_dataset is None:
self.logger.warning(
"eval_dataset is not set, falling back to train dataset"
)
return self.get_dataset(n, seed)
if seed is not None:
self.eval_dataset = self.eval_dataset.shuffle(seed=seed)
if n > 0:
# Cap n to the length of the dataset to prevent IndexError
n = min(n, len(self.eval_dataset))
return self.eval_dataset.select(range(n))
return self.eval_dataset
I think maybe the easiest way to solve this is to have get_eval_dataset use getattr to find a seed param from the env itself, if provided?