colpali Use vidore benchmark to monitor performances during training

Code to be able to monitor real retrieving metrics on datasets (e.g ViDoRe benchmark) during training.

This feature is deactivated by default and is designed for power users.

To use, simply add in your training config :

vidore_eval_frequency: 200 #frequency of the benchmark eval
eval_dataset_format: "qa" #format of the benchmark datasets (qa or beir)

An example can be found at scripts/configs/qwen2/train_colqwen2_model_eval_vidore.yaml

Feb 14 '25 15:02 QuentinJGMace

Recap from our conversation 👋🏼

Let's:

remove the legacy evaluation code
add optional training arg run_vidore_evalutor: if False, do not add the custom callback
add optional training args for vidore_eval_dataset_name and vidore_eval_collection_name (if both are fed, raise error)
add optional training arg to control how often the eval will run (e.g. once every 5 eval steps).

Feb 17 '25 13:02 tonywu71

@QuentinJGMace vidore-benchmark v5.0.0 has been released, don't forget to bump this dep in pyprojetct.toml 😉

Feb 19 '25 10:02 tonywu71

@QuentinJGMace @tonywu71 updates ?

Apr 02 '25 09:04 ManuelFay