representation-engineering
representation-engineering copied to clipboard
Question about the honesty scores calculation
Hi
In your honest scores calculation, what is the justification of
results[pos][0][layer][0] * honesty_rep_reader.direction_signs[layer][0]
Why you need to multiply by the direction sign, not just using the results[pos][0][layer][0]
Thanks