geval
geval copied to clipboard
Code for paper "G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"
I realize this is quite late, and it may no longer be actively maintained given how much the field has moved. I was curious if you had experimented using a...
Fluency is the only score that is rated 1-3 instead of 1-5 as the others as per the prompt instructions. The output in the summeval.json file however indicates that fluency...
G-Eval includes "Auto Chain-of-Thoughts for NLG Evaluation" as a component where the CoT steps to carry out evaluation are produced by an LLM. The paper nor this repo, however, include...
It seems that there is only prompt and dataset for summeval, request for the one of TopicalChat in the original paper. :) Thanks!
# What @nlpyang please ensure you have langchain and Labelstudio integration # Why Enterprises wanting to leverage your research might have to make quick assessments. Using tools like [langchain](https://python.langchain.com/docs/get_started/introduction.html) and...
Hi, I didn't notice a license for the code. Can you please provide one? Thank you for the project!
Hi team, Thank you so much for this work, it is interesting and inspiring to me. I wonder would you plan to release prompts and results for two more benchmarks...