elk
elk copied to clipboard
Keeping language models honest by directly eliciting knowledge encoded in their activations.
updates: - [github.com/pre-commit/pre-commit-hooks: v4.4.0 → v5.0.0](https://github.com/pre-commit/pre-commit-hooks/compare/v4.4.0...v5.0.0) - [github.com/psf/black: 23.7.0 → 24.8.0](https://github.com/psf/black/compare/23.7.0...24.8.0) - [github.com/astral-sh/ruff-pre-commit: v0.0.278 → v0.6.9](https://github.com/astral-sh/ruff-pre-commit/compare/v0.0.278...v0.6.9) - [github.com/codespell-project/codespell: v2.2.5 → v2.3.0](https://github.com/codespell-project/codespell/compare/v2.2.5...v2.3.0)
Reproduced on my local setup and on colab ```py !pip install git+https://github.com/EleutherAI/elk/ import elk ``` ``` ----> 2 import elk [/usr/local/lib/python3.10/dist-packages/elk/__init__.py](https://localhost:8080/#) in ----> 1 from .evaluation import Eval 2 from...
It resolves issue #295
https://github.com/RohitRathore1/elk/blob/84e99a36a5050881d85f1510a2486ce46ac1f942/tests/test_smoke_eval.py#L19C1-L20C35
Ensembling from mid to last layer
Previously, the visualization code only allowed for ``auroc_estimate`` to be visualized. This PR adds this functionality, allowing users to specify an optional --metric argument in ``elk plot`` or ``elk sweep``
Solves NOT-291 This is quite a complex change, but this basically aims to train a reporter model per prompt, then evaluate it both on each individual prompt as well as...
fixes not-273
Adds `LdaFitter` for supervised LDA reporters