Kaijie Zhu

Results 24 comments of Kaijie Zhu

Hi, our current approach involves directing the model to specifically output the desired labels ('positive' or 'negative'), which are then enclosed within a unique pattern (e.g., ''). This format allows...

Hi, I have tested qqp dataset with Flan-T5 model using StressTest attack, it works for me. Could you please test it and paste the detailed error messages here?

I have the same issue. The safety settings does not work for me, instead, changing temperature from 0 to 0.7 works. The generated contents may be blocked since I found...

Hi, could you please indicate which model you are using for the attack? The difference may arise from the use of a different model.

Could you please check and compare the results [here](https://huggingface.co/spaces/March07/PromptBench)? In this website, the results for T5 in SST-2 dataset is around 95%. ![image](https://github.com/user-attachments/assets/be204a0b-bba0-4177-a383-ac751b5a9d45)

Hi, can you share the code you are running with me? I tested the following code, it worked well. ``` import promptbench as pb dataset = pb.DatasetLoader.load_dataset("sst2") print(dataset) ``` BTW,...

You can ask the generative models to output in some certain formats, like , then you can use regex to parse the answer.

Can you show your reproduced results? Also, have you checked the test set? I saw you revised the code for loading the test set. Also please pay attention to the...

> @Ki-Zhang As of January 2024, the entire list of Harm Categories can be found [here](https://ai.google.dev/api/rest/v1beta/HarmCategory). The implementation for `gemini-pro` or `gemini-pro-vision` can be carried out as follows in Python:...