representation-engineering issues

How to select the layers used for control?

2

Hi！ For the control phase, may I ask how you select the layers to be controlled? Thanks very much！

Question about the honesty scores calculation

1

Hi In your honest scores calculation, what is the justification of results[pos][0][layer][0] * honesty_rep_reader.direction_signs[layer][0] Why you need to multiply by the direction sign, not just using the results[pos][0][layer][0] Thanks

Jeffwang87

Question about customize pipeline in code

1

What does the self function in the function of your customized pipeline refer to? Thanks

Jeffwang87

documentation

Accelerate the rep-reading

7

I found the calculation of the reading pipeline is super slow. I send the projection and recenter to GPU and make it run 20x faster. May I open a pull...

Y-L-LIU

gemma-2 support / sdpa NaN error

We get this error when doing rep-reading on google/gemma-2-2b-it: ``` ValueError: Input X contains NaN. PCA does not accept missing values encoded as NaN natively. For supervised learning, you might...

justinwangx

What are the control_method available? only "reading_vec"?

Thanks for sharing this exciting repo and I appreciate it a lot. I want to ask whether you have implemented the control methods introduced in the paper, e.g., contrast vector,...

wenjunli-0

What is the parameter to reproduce result on TQA

1

What is the parameter in `llama_lorra_tqa_7b.sh` to reproduce the result in paper **55.0** (on TQA dataset) `{'tqa_accuracy': 0.42717258261933905, 'arc-e_accuracy': 0.6929824561403509}` ![Screenshot from 2024-07-23 17-47-48](https://github.com/user-attachments/assets/2e058f9c-98da-4850-af62-cf062f9f4288)

YerongLi

LOSS does not need original, LOSS calibrate the difference between +/- ??

1

The LOSS function in the training flow should be minimizing the difference between the positive and negative hidden states. We don['t need the original activations right? **So there is no...

YerongLi

How you evaluate the in-domain generalization of the honesty probe?

1

I am curious how you evaluate the in-domain generalization of the honesty probe. I found this in the paper `With this setup, the resulting LAT reading vector reaches a classification...

wenjunli-0

kwargs missed during generation

The repe_kwargs passed to generate will get lost. I assume it's due to things happen in the HuggingFace generate method. System info: I'm using the latest transformers

HenryCai11

representation-engineering
representation-engineering copied to clipboard

Metadata

How to select the layers used for control?

Question about the honesty scores calculation

Question about customize pipeline in code

Accelerate the rep-reading

gemma-2 support / sdpa NaN error

What are the control_method available? only "reading_vec"?

What is the parameter to reproduce result on TQA

LOSS does not need original, LOSS calibrate the difference between +/- ??

How you evaluate the in-domain generalization of the honesty probe?

kwargs missed during generation

← Metadata

Owner

Metadata

representation-engineering representation-engineering copied to clipboard

Metadata

← Metadata

Owner

Metadata

representation-engineering
representation-engineering copied to clipboard