promptbench issues

Pip's dependency conflicts :datasets==2.15.0

I want to use this project to evaluate my models.Howerer,when I use the command: ``` pip install promptbench ``` It raise a error as the title.The promptbench require the datasets==2.15.0,but...

bg51717

examples basic.ipyn

from tqdm import tqdm for prompt in prompts: preds = [] labels = [] for data in tqdm(dataset): # process input input_text = pb.InputProcess.basic_format(prompt, data) label = data['label'] print(type(input_text)) raw_pred...

zl-comment

Access to per-sample evaluation results

1

Hi, Thanks for the great work! For my current project, I am looking to use the sample-wise evaluation results of VLMs for the experiments you have conducted. If you can...

adhirajghosh

promptbench.utils.Visualizer: 'LLMModel' object has no attribute 'infer_model'

2

Hi~ When I was using the visualization feature, I encountered the error "'LLMModel' object has no attribute 'infer_model'". Here are my code and error. Could you solve it? Thanks a...

metaphors

clarification on the attacks

2

Hello, I am currently following your prompt_attack.ipynb. However, I note that the attacks here only get edited once, even though the attack can print out various versions of adversarial prompts....

spaceship-git

no-issue-activity

DyVal can't be used via pypi install

1

After i installed via pypi and execute examples, i found that ` No module named 'promptbench.dyval.DAG' ` I checked pkgs in my environment and I confirmed that this folder is...

GhostXu11

Why are the experimental results different?

3

## bertattack original prompt: Evaluate the sentiment of the given text and classify it as 'positive' or 'negative': original score: 0.4934426229508197 attacked prompt: Evaluate the sеntiment of the given text...

zl-comment

How to evaluate the prompt score, not at the time of attack, but only?

Because the accuracy of my experimental results does not match the paper. Because I want to directly evaluate the prompt after being attacked by text to see if the score...

zl-comment

Want to know what code is used to implement the attention visualization in the figure below?

2

![微信图片_20240913235257](https://github.com/user-attachments/assets/0e2aff1c-ea7b-49aa-91b7-b7ab4ce516c4)

zl-comment

In order to be able to read the local GLUE data set, I modified the GLUE code so that the attack evaluation can be carried out, but the current score of the output is always 0. I want to know why?

3

class GLUE(Dataset): """ GLUE class is a dataset class for the General Language Understanding Evaluation benchmark, supporting multiple natural language understanding tasks. Examples: [{'content': "it 's a charming and often...

zl-comment

promptbench
promptbench copied to clipboard

Metadata

Pip's dependency conflicts :datasets==2.15.0

examples basic.ipyn

Access to per-sample evaluation results

promptbench.utils.Visualizer: 'LLMModel' object has no attribute 'infer_model'

clarification on the attacks

DyVal can't be used via pypi install

Why are the experimental results different?

How to evaluate the prompt score, not at the time of attack, but only?

Want to know what code is used to implement the attention visualization in the figure below?

In order to be able to read the local GLUE data set, I modified the GLUE code so that the attack evaluation can be carried out, but the current score of the output is always 0. I want to know why?

← Metadata

Owner

Metadata

promptbench promptbench copied to clipboard

Metadata

← Metadata

Owner

Metadata

promptbench
promptbench copied to clipboard