shallow-vs-deep-alignment icon indicating copy to clipboard operation
shallow-vs-deep-alignment copied to clipboard

Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep

Results 2 shallow-vs-deep-alignment issues
Sort by recently updated
recently updated
newest added

Hi Authors, Thank you for your great piece of work! Can I check with you how you computed the KL divergence between aligned and unaligned models? For example, the aligned...

Hi, First of all, thank you so much for your amazing work on this library! It's been incredibly helpful for my projects, and I really appreciate you making it open-source....