geval issues

Evaluation with single prompt

I realize this is quite late, and it may no longer be actively maintained given how much the field has moved. I was curious if you had experimented using a...

macabdul9

Fluency outcome different from prompt instructions

Fluency is the only score that is rated 1-3 instead of 1-5 as the others as per the prompt instructions. The output in the summeval.json file however indicates that fluency...

jealyvda

How is the "Auto CoT" prompt defined?

2

G-Eval includes "Auto Chain-of-Thoughts for NLG Evaluation" as a component where the CoT steps to carry out evaluation are produced by an LLM. The paper nor this repo, however, include...

calvdee

Prompt and Dataset for Dialogue Benchmark

2

It seems that there is only prompt and dataset for summeval, request for the one of TopicalChat in the original paper. :) Thanks!

jacklanda

# What @nlpyang please ensure you have langchain and Labelstudio integration # Why Enterprises wanting to leverage your research might have to make quick assessments. Using tools like [langchain](https://python.langchain.com/docs/get_started/introduction.html) and...

nrshrivatsan

Can you provide a license?

Hi, I didn't notice a license for the code. Can you please provide one? Thank you for the project!

big-c-note

More benchmarks and prompt clarification

Hi team, Thank you so much for this work, it is interesting and inspiring to me. I wonder would you plan to release prompts and results for two more benchmarks...

ZhuohanX

geval
geval copied to clipboard

Metadata

Evaluation with single prompt

Fluency outcome different from prompt instructions

Update README.md

How is the "Auto CoT" prompt defined?

Prompt and Dataset for Dialogue Benchmark

init

label studio integration

Can you provide a license?

More benchmarks and prompt clarification

← Metadata

Owner

Metadata

geval geval copied to clipboard

Metadata

← Metadata

Owner

Metadata

geval
geval copied to clipboard