Clarification or Improvement: Should preprocess_model_input Also Affect Inputs to Scorer Functions?
Description
Currently, in the Weave evaluation framework (weave.flow.eval), the preprocess_model_input function provided by users only transforms inputs passed into the model's prediction function. Scorer functions, however, always receive the original, unprocessed input example directly from the dataset.
Current Behavior
preprocess_model_inputtransforms the input before it's passed to the model's predict function.- Scorer functions receive the unprocessed original dataset input, ignoring any preprocessing step--is my understanding correct?
Relevant Code Snippet:
# apply preprocessing for model input
apply_model_result = await apply_model_async(model, example, self.preprocess_model_input)
# scorer gets original input without preprocessing
for scorer in self.scorers:
apply_scorer_result = await model_call.apply_scorer(scorer, example)
Issue
This behavior might lead to confusion, as users might intuitively expect scorers to also evaluate based on preprocessed inputs (especially if preprocessing involves essential normalization, cleaning, or formatting operations required by both the model and evaluation metrics).
Suggested Resolution
- Clarify in documentation explicitly that scorer functions always receive original inputs.
- Or consider adjusting the behavior to allow an option for scorers to receive preprocessed inputs, potentially through an additional argument or flag within the
Evaluationclass.
Additional Context
This clarification or adjustment will help users avoid subtle bugs or misunderstandings when setting up evaluations, especially in complex preprocessing scenarios.
Related Links
- [weave/flow/eval.py](https://github.com/wandb/weave/blob/master/weave/flow/eval.py)
- [weave/flow/model.py](https://github.com/wandb/weave/blob/master/weave/flow/model.py)
@oekekezie thanks for taking the time to read our docs and provide helpful feedback.
We've updated the docs to explicitly clarify that scorer functions always receive original inputs in https://github.com/wandb/weave/pull/4322. The PR will be merged and reviewed shortly.