AvgMinMax median approximation is inconsistent

Open fsiino-nvidia opened this issue 1 month ago • 0 comments

Describe the bug

The median value in dataset metrics (train_data_utils.py) produces different results on each run, even with identical input data. This causes validation failures when comparing metrics files. The _validate_aggregate_metrics function detects differences in the median field and raises a ValueError about conflicting aggregate metrics.

Steps/Code to reproduce bug

Run data preparation on any dataset, e.g.:

config_paths="responses_api_models/openai_model/configs/openai_model.yaml,\
resources_servers/library_judge_math/configs/bytedtsinghua_dapo17k.yaml"
ng_prepare_data "+config_paths=[${config_paths}]" \
    +output_dirpath=data/bytedtsinghua_dapo17k \
    +mode=train_preparation +should_download=true

This may or may not produce a ValueError about conflicting aggregate metrics:

Differences found in aggregate metrics:
[
    'Numeric mismatch at {field_name}.Median: 80.33 != 80.44'
]

...

Found conflicting aggregate metrics that need to be corrected:
- resources_servers/math_with_judge/data/dapo17k_train_metrics_conflict.json
- resources_servers/math_with_judge/data/dapo17k_validation_metrics_conflict.json

This could be due to a change in how metrics are calculated, leading to outdated metrics. Try deleting the below file(s) and rerunning data preparation:
- resources_servers/math_with_judge/data/dapo17k_train_metrics.json
- resources_servers/math_with_judge/data/dapo17k_validation_metrics.json

Expected behavior

Metrics should be deterministic. Running data preparation multiple times on the same dataset should produce identical metrics, including the median. The validation check should pass when re-running with unchanged data.

Configs Any dataset configuration.

Environment details

Otherwise, please provide: N/A

Additional context

The AvgMinMax class uses TDigest for median estimation. This is an approximation of the median, and is not guaranteed to be exactly the same on each run.

Nov 19 '25 22:11 fsiino-nvidia