shallow-vs-deep-alignment
shallow-vs-deep-alignment copied to clipboard
Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep
Results
2
shallow-vs-deep-alignment issues
Sort by
recently updated
recently updated
newest added
Hi Authors, Thank you for your great piece of work! Can I check with you how you computed the KL divergence between aligned and unaligned models? For example, the aligned...
Hi, First of all, thank you so much for your amazing work on this library! It's been incredibly helpful for my projects, and I really appreciate you making it open-source....