Kaijie Zhu comments

Results 24 comments of


                                            Kaijie Zhu

Parsing the output of a model

Hi, our current approach involves directing the model to specifically output the desired labels ('positive' or 'negative'), which are then enclosed within a unique pattern (e.g., ''). This format allows...

Parsing the output of a model

Hi, I have tested qqp dataset with Flan-T5 model using StressTest attack, it works for me. Could you please test it and paste the detailed error messages here?

Misleading ValueError raised on seemingly blocked content, chat history corrupt and session can not be continued

I have the same issue. The safety settings does not work for me, instead, changing temperature from 0 to 0.7 works. The generated contents may be blocked since I found...

In order to be able to read the local GLUE data set, I modified the GLUE code so that the attack evaluation can be carried out, but the current score of the output is always 0. I want to know why?

Thank you very much for the contribution! We will look into this.

Why are the experimental results different?

Hi, could you please indicate which model you are using for the attack? The difference may arise from the use of a different model.

Why are the experimental results different?

Could you please check and compare the results [here](https://huggingface.co/spaces/March07/PromptBench)? In this website, the results for T5 in SST-2 dataset is around 95%. ![image](https://github.com/user-attachments/assets/be204a0b-bba0-4177-a383-ac751b5a9d45)

Try to run the basic sample

Hi, can you share the code you are running with me? I tested the following code, it worked well. ``` import promptbench as pb dataset = pb.DatasetLoader.load_dataset("sst2") print(dataset) ``` BTW,...

May I ask how you used a large generative model to allow him to accurately answer classification tasks for experiments?

You can ask the generative models to output in some certain formats, like , then you can use regex to parse the answer.

The MMLU dataset results of google_flan_t5_large are lower than your experimental results

Can you show your reproduced results? Also, have you checked the test set? I saw you revised the code for loading the test set. Also please pay attention to the...

ValueError: The `response.text` quick accessor only works for simple (single-`Part`) text responses. This response is not simple text.Use the `result.parts` accessor or the full `result.candidates[index].content.parts` lookup instead.

> @Ki-Zhang As of January 2024, the entire list of Harm Categories can be found [here](https://ai.google.dev/api/rest/v1beta/HarmCategory). The implementation for `gemini-pro` or `gemini-pro-vision` can be carried out as follows in Python:...