geval
geval copied to clipboard
Evaluation with single prompt
I realize this is quite late, and it may no longer be actively maintained given how much the field has moved. I was curious if you had experimented using a single prompt with a structured output to evaluate all dimensions (e.g., coherence, actuality, etc.) simultaneously. This would have been more cost-efficient, and I would expect the scores to be comparable to those using individual prompts for each characteristic.