representation-engineering icon indicating copy to clipboard operation
representation-engineering copied to clipboard

Question about the honesty scores calculation

Open Jeffwang87 opened this issue 1 year ago • 1 comments

Hi

In your honest scores calculation, what is the justification of

results[pos][0][layer][0] * honesty_rep_reader.direction_signs[layer][0]

Why you need to multiply by the direction sign, not just using the results[pos][0][layer][0]

Thanks

Jeffwang87 avatar Nov 29 '23 18:11 Jeffwang87