vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[Feature]: Control vectors

Open generalsvr opened this issue 3 months ago • 6 comments

🚀 The feature, motivation and pitch

Add support for control vectors

See https://github.com/vgel/repeng and https://github.com/ggerganov/llama.cpp/pull/5970

Alternatives

No response

Additional context

No response

generalsvr avatar Mar 17 '24 01:03 generalsvr

@simon-mo @generalsvr I should be able to help with this. Let me know how to start.

For more context about control vectors: Representation Engineering: A Top-Down Approach to AI Transparency

justinphan3110 avatar Apr 13 '24 00:04 justinphan3110

We can achieve this by loading the control vectors when initializing the cache engine and apply the change to forward() of specified QKVLinear layers, but such changes will be added for all models and all kinds of linear method, which introduce extra complexity to the codebase. Do you have any hints on how we can abstract such logic and make the integration clear? @simon-mo

Kaiyang-Chen avatar Apr 15 '24 20:04 Kaiyang-Chen

Something additional to consider is specifying different control vectors (and coefficients) per request which then get stacked into a control matrix with one dimension equal to the batch size.

This can be useful when serving users that require different styles of responses at the same time.

Not sure about the impact on latency.

sapountzis avatar Apr 24 '24 22:04 sapountzis

currently working on an implementation by wrapping the decoder layer and changing the forward pass. lmk if you wanna collaborate on this

raywanb avatar Apr 25 '24 20:04 raywanb

@raywanb somethingworth looking into would be also the technique presented here, which might be superior in some regards:

https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction

It comes with a nice colab as well: https://colab.research.google.com/drive/1a-aQvKC9avdZpdyBn4jgRQFObTPy1JZw?usp=sharing&authuser=1

There's a discussion in the comments with the authors of the Represenation Engineering paper.

DreamGenX avatar Apr 28 '24 17:04 DreamGenX

@raywanb somethingworth looking into would be also the technique presented here, which might be superior in some regards:

https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction

It cames with a nice colab as well: https://colab.research.google.com/drive/1a-aQvKC9avdZpdyBn4jgRQFObTPy1JZw?usp=sharing&authuser=1

There's a discussion in the comments with the authors of the Represenation Engineering paper.

It seems that the colab link doesn't work.

heraclex12 avatar Apr 29 '24 16:04 heraclex12