promptbench
promptbench copied to clipboard
A unified evaluation framework for large language models
I want to use this project to evaluate my models.Howerer,when I use the command: ``` pip install promptbench ``` It raise a error as the title.The promptbench require the datasets==2.15.0,but...
from tqdm import tqdm for prompt in prompts: preds = [] labels = [] for data in tqdm(dataset): # process input input_text = pb.InputProcess.basic_format(prompt, data) label = data['label'] print(type(input_text)) raw_pred...
Hi, Thanks for the great work! For my current project, I am looking to use the sample-wise evaluation results of VLMs for the experiments you have conducted. If you can...
Hi~ When I was using the visualization feature, I encountered the error "'LLMModel' object has no attribute 'infer_model'". Here are my code and error. Could you solve it? Thanks a...
Hello, I am currently following your prompt_attack.ipynb. However, I note that the attacks here only get edited once, even though the attack can print out various versions of adversarial prompts....
After i installed via pypi and execute examples, i found that ` No module named 'promptbench.dyval.DAG' ` I checked pkgs in my environment and I confirmed that this folder is...
## bertattack original prompt: Evaluate the sentiment of the given text and classify it as 'positive' or 'negative': original score: 0.4934426229508197 attacked prompt: Evaluate the sеntiment of the given text...
Because the accuracy of my experimental results does not match the paper. Because I want to directly evaluate the prompt after being attacked by text to see if the score...

class GLUE(Dataset): """ GLUE class is a dataset class for the General Language Understanding Evaluation benchmark, supporting multiple natural language understanding tasks. Examples: [{'content': "it 's a charming and often...