LLMs-Finetuning-Safety icon indicating copy to clipboard operation
LLMs-Finetuning-Safety copied to clipboard

We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20 via OpenAI’s APIs.

Results 4 LLMs-Finetuning-Safety issues
Sort by recently updated
recently updated
newest added

Thanks for your great work! The paper said the temperature and top_p were set to 0 during inference, but the code here shows the temp is set to 1. Perhaps...

When using llama2 fine-tuning in tier-1 notebook with multi-gpu, the code goes into following line https://github.com/LLM-Tuning-Safety/LLMs-Finetuning-Safety/blob/8a3b38f11be1c3829e2b0ed379d3661ebc84e7db/llama2/utils/train_utils.py#L127 `total_loss` turns out to be `float` instead of `torch.Tensor` because of L89 and L102...

Hi authors, Thanks for the wonderful initial work on harmful fine-tuning. We recently noticed a huge amount of papers coming out on the harmful fine-tutning attacks for LLMs. We have...