FollowBench issues

some questions

1

Thank you for proposing this interesting benchmark. After finishing the **Model Inference** and **LLM-based Evaluation**, we tried to obtain the results as shown in **Merge Evaluation and Save Results**. However,...

AccidM

will you release the code making the eval data?

1

Awesome work! Will you release the code that creates your eval dataset because it's quite complicated from your description in the paper. Using this method to generate more data for...

menghonghan

example not prompt

1

Hello, may I ask a question? Why does the example not have a prompt? It only performs rule-based evaluation?

zhejunliux

some question

6

Hello, I have a question: After I executed model_inference.py and got the results, do I need to use my own model to infer all the questions before executing llm_eval.py? What...

yuanzhiyong1999

The number of LLM evaluated examples

1

I just run below code and find that the examples need to be evaluated by LLM are not equivalent to your papers. `rule_based_source = ["E2E", "WIKIEVENTS", "CONLL2003", "text_editing", "cnn_dailymail", "xsum",...

kkk-an

about evaluation

2

In addition to constraint **‘example’,** is only gpt4 used for the evaluation of other constraint_type? Or are other models evaluated using both rule_based and gpt? 想请问一下这里面除了example以外的其他constraint_type的评估是只用了gpt4吗？还是说其他模型的评估既要用rule_based还要用gpt，双重打分？

Violettttee

FollowBench
FollowBench copied to clipboard

Metadata

some questions

will you release the code making the eval data?

example not prompt

some question

The number of LLM evaluated examples

about evaluation

← Metadata

Owner

Metadata

FollowBench FollowBench copied to clipboard

Metadata

some questions

will you release the code making the eval data?

example not prompt

some question

The number of LLM evaluated examples

about evaluation

← Metadata

Owner

Metadata

FollowBench
FollowBench copied to clipboard