promptbench icon indicating copy to clipboard operation
promptbench copied to clipboard

A unified evaluation framework for large language models

Results 27 promptbench issues
Sort by recently updated
recently updated
newest added

I want to use this project to evaluate my models.Howerer,when I use the command: ``` pip install promptbench ``` It raise a error as the title.The promptbench require the datasets==2.15.0,but...

from tqdm import tqdm for prompt in prompts: preds = [] labels = [] for data in tqdm(dataset): # process input input_text = pb.InputProcess.basic_format(prompt, data) label = data['label'] print(type(input_text)) raw_pred...

Hi, Thanks for the great work! For my current project, I am looking to use the sample-wise evaluation results of VLMs for the experiments you have conducted. If you can...

Hi~ When I was using the visualization feature, I encountered the error "'LLMModel' object has no attribute 'infer_model'". Here are my code and error. Could you solve it? Thanks a...

Hello, I am currently following your prompt_attack.ipynb. However, I note that the attacks here only get edited once, even though the attack can print out various versions of adversarial prompts....

no-issue-activity

After i installed via pypi and execute examples, i found that ` No module named 'promptbench.dyval.DAG' ` I checked pkgs in my environment and I confirmed that this folder is...

## bertattack original prompt: Evaluate the sentiment of the given text and classify it as 'positive' or 'negative': original score: 0.4934426229508197 attacked prompt: Evaluate the sеntiment of the given text...

Because the accuracy of my experimental results does not match the paper. Because I want to directly evaluate the prompt after being attacked by text to see if the score...

![微信图片_20240913235257](https://github.com/user-attachments/assets/0e2aff1c-ea7b-49aa-91b7-b7ab4ce516c4)

class GLUE(Dataset): """ GLUE class is a dataset class for the General Language Understanding Evaluation benchmark, supporting multiple natural language understanding tasks. Examples: [{'content': "it 's a charming and often...