verifiers icon indicating copy to clipboard operation
verifiers copied to clipboard

Smartly prepare `EnvGroup` dataset for eval

Open ob1-s opened this issue 2 months ago • 0 comments

If I got this right, currently if you try to eval an EnvGroup with a --num-samples smaller than the EnvGroup's first env, the eval will run only on that first env.

backtrace:

  1. env_group.py#EnvGroup: it concats eval_datasets.
        eval_dataset = concatenate_datasets(eval_datasets) if eval_datasets else None
  1. environment.py#Environment.get_eval_dataset: It uses Dataset.select(range(n)). if n is smaller or same size as the first env's dataset, this will return a dataset with only examples from that env.

Additionally, I noticed the seed param is not being used anywhere in the codebase when calling get_dataset or get_eval_dataset, so it seems to effectively be always None.

    def get_eval_dataset(self, n: int = -1, seed: int | None = None) -> Dataset:
        if self.eval_dataset is None:
            self.logger.warning(
                "eval_dataset is not set, falling back to train dataset"
            )
            return self.get_dataset(n, seed)
        if seed is not None:
            self.eval_dataset = self.eval_dataset.shuffle(seed=seed)
        if n > 0:
            # Cap n to the length of the dataset to prevent IndexError
            n = min(n, len(self.eval_dataset))
            return self.eval_dataset.select(range(n))
        return self.eval_dataset

I think maybe the easiest way to solve this is to have get_eval_dataset use getattr to find a seed param from the env itself, if provided?

ob1-s avatar Oct 21 '25 19:10 ob1-s