representation-engineering
representation-engineering copied to clipboard
Representation Engineering: A Top-Down Approach to AI Transparency
We see huge demand for text-to-image honesty check. Please help us provide such capability
Hey lab. I am working on a POC with a customer support AI company. They ask us to provide honesty benchmark to use data to prove we are more honest...
I notice in the paper and the example Jupyter code, the output of ASSISTANT(response), or the statement is truncated, I would like to know the reason. Thank you so much!
I have been experimenting with the linear, piece-wise, and projection operations for representation control. It would be useful to have the projection operation available for reference to make sure my...
As far as I can tell, this is the part where the Contrast Vector is applied: https://github.com/andyzoujm/representation-engineering/blob/8de198b4fbd48e1068285d6d0650897a946c7f74/repe/rep_control_contrast_vec.py#L331-L338 I asked myself the question why not only the `hidden_states` are changed but...
Hi! Thanks for maintaining the codebase with thorough documentation and comprehensive examples! Is it possible to add the CLIP examples in Appendix B.5? Also, could you clarify how the emotion...
Hello authors, Your experiment results on harmfulness classification:`https://github.com/andyzoujm/representation-engineering/blob/main/examples/harmless_harmful/harmless_llama2.ipynb` shows that Llama-2-13b-chat achieves near 100% acc, even in the lower layers. I have tried more model: Llama-2-{7,70}b-chat, llama-2-7b, bloomz-{560m,1b1,1b7,3b,7b1}, bloom-7b1, all...
Hi, I was wondering if code demonstrating how to use the contrast vector for text generation could be pushed? (Ideally for a 7B model!) I am quite unclear on how...
Andy and the team: We made two performance enhancements: Flash Attention & Int8 quantization to be able to make the execution speed 4-5 times faster. Please let us know if...
Currently, after training the `rep_reader`, the `coeff` variable used in the control pipeline need to be customized solely by experiment, and the value changes a lot, take the `primary_emotions` as...