promptbench icon indicating copy to clipboard operation
promptbench copied to clipboard

A unified evaluation framework for large language models

Results 27 promptbench issues
Sort by recently updated
recently updated
newest added

Very keen to test against Phi-2. Is support expected to be added soon?

Hi, it is a good job! I want to know how to use visualize.py to llama-2-7b? I see that the code visualizes by calculating gradients rather than using the attention...

I tried to benchmark LLaMa 2 chat model from HuggingFace and got a `ValueError`: `ValueError: `temperature` (=0) has to be a strictly positive float, otherwise your next token scores will...

Dear PromptBench Contributors, I trust this message finds you in good health and high spirits. I am writing to propose an enhancement to the PromptBench library that I believe could...

Clicking on the red area will jump into the void.. ![image](https://github.com/microsoft/promptbench/assets/8592144/8ffaba30-44dd-4fb1-bbf2-8f94211392c1) ![image](https://github.com/microsoft/promptbench/assets/8592144/89ba3e5a-13ee-4250-bf97-2e0cda9ab64d)the cirs

![image](https://github.com/microsoft/promptbench/assets/40749119/73e40a33-cc2d-46d3-93a8-8ca64719594d)

The output of the model largely depends on the prompt. When performing text classification tasks such as sst2, which requires the model to give negative or positive responses, the model...

no-issue-activity

I tried this prompt but didn't work well : ``` The following are multiple choice questions (with answers) about abstract algebra. ========example====== In order to make the title of this...

Bumps [tqdm](https://github.com/tqdm/tqdm) from 4.66.1 to 4.66.3. Release notes Sourced from tqdm's releases. tqdm v4.66.3 stable cli: eval safety (fixes CVE-2024-34062, GHSA-g7vv-2v7x-gj9p) tqdm v4.66.2 stable pandas: add DataFrame.progress_map (#1549) notebook: fix...

dependencies

Bumps [tqdm](https://github.com/tqdm/tqdm) from 4.66.1 to 4.66.3. Release notes Sourced from tqdm's releases. tqdm v4.66.3 stable cli: eval safety (fixes CVE-2024-34062, GHSA-g7vv-2v7x-gj9p) tqdm v4.66.2 stable pandas: add DataFrame.progress_map (#1549) notebook: fix...

dependencies