bash: lm-saes: command not found
Hi, I am new to this repo, and I got this error when I followed the readme to train the SAE:
(llama_scope) [email protected]:/Language-Model-SAEs$ lm-saes train examples/configuration/train.toml
bash: lm-saes: command not found
I created a conda environment with python 3.10, then I followed the instruction: pdm install, then downloaded bun, then run lm-saes train examples/configuration/train.toml. That is where I got the error.
Can you please take a look? Thank you
I'm sorry, but the current README and examples are actually outdated. We'll update them as soon as we have enough capacity.
Currently we recommend to use uv as the package manager (as a drop-in replacement of pdm). Then you could try training an SAE of Pythia with the following script:
import torch
from lm_saes import (
ActivationFactoryConfig,
ActivationFactoryDatasetSource,
ActivationFactoryTarget,
InitializerConfig,
SAEConfig,
TrainerConfig,
TrainSAESettings,
WandbConfig,
train_sae,
)
if __name__ == "__main__":
settings = TrainSAESettings(
sae=SAEConfig(
hook_point_in="blocks.3.ln1.hook_normalized",
hook_point_out="blocks.3.ln1.hook_normalized",
d_model=768,
expansion_factor=8,
act_fn="topk",
norm_activation="token-wise",
sparsity_include_decoder_norm=True,
top_k=50,
dtype=torch.float32,
device="cuda",
),
initializer=InitializerConfig(
init_search=True,
state="training",
),
trainer=TrainerConfig(
lp=1,
initial_k=768 / 2,
lr=4e-4,
lr_scheduler_name="constantwithwarmup",
total_training_tokens=600_000_000,
log_frequency=1000,
eval_frequency=1000000,
n_checkpoints=5,
check_point_save_mode="linear",
exp_result_path="results",
),
wandb=WandbConfig(
wandb_project="pythia-160m-test",
exp_name="pythia-160m-test",
),
activation_factory=ActivationFactoryConfig(
sources=[
ActivationFactoryDatasetSource(
name="openwebtext",
)
],
target=ActivationFactoryTarget.BATCHED_ACTIVATIONS_1D,
hook_points=["blocks.3.ln1.hook_normalized"],
batch_size=2048,
buffer_size=None,
ignore_token_ids=[],
),
sae_name="pythia-160m-test-L3",
sae_series="pythia-160m-test",
)
train_sae(settings)
Hope to know if this setup works!
Hi, Thank you for your reply. I have following questions:
- Can you tell me what command I should use to build environment using
uv? - Should I just copy the update script to a
pyfile? - Should I run the training using command
python <new_script>.py? - Is that all I need to do?
Hi,
Can you please provide me a clearer instruction? I have tried creating environment using uv pip install --sync, uv pip install -r uv.lock and none of them works. So I just used the environment that I created using pdm to run the newly provided python code, and I still got this error:
(llama_scope) [email protected]:/Language-Model-SAEs$ python examples/configuration/train.py
/opt/conda/envs/llama_scope/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py:275: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:81.)
cpu = _conversion_method_template(device=torch.device("cpu"))
Traceback (most recent call last):
File "/Language-Model-SAEs/examples/configuration/train.py", line 3, in <module>
from lm_saes import (
ModuleNotFoundError: No module named 'lm_saes'
Can you please take a look and help me with it?
Sorry for the late reply.
- Can you tell me what command I should use to build environment using
uv?
Once you have uv installed (following the instructions on this), you do not need any explicit command to build the environment. uv will handle resolving & downloading the required packages when you actually run some scripts in the project. If you really want to have the packages explicitly downloaded (this may be necessary if your GPUs have no internet connections), you can run uv sync.
- Should I just copy the update script to a
pyfile?
Yes.
- Should I run the training using command
python <new_script>.py?
You should run uv run <new_script>.py to activate the uv venv environment and run the script.
- Is that all I need to do?
It should work smoothly with the above steps. Please let me know if there are any further problems!
Hi,
Thank you for your reply. I have now followed your new instruction. I created a new conda environment with python 3.12.0. Then I use pip install uv installed uv. Then I copied the new python script to train.py. Then I run the script using uv run ./examples/configuration/train.py. However, I got this error:
(llama-scope) [email protected]:/Language-Model-SAEs$ uv run ./examples/configuration/train.py
Traceback (most recent call last):
File "/Language-Model-SAEs/./examples/configuration/train.py", line 64, in <module>
train_sae(settings)
File "/Language-Model-SAEs/src/lm_saes/runner.py", line 286, in train_sae
activations_stream = activation_factory.process()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Language-Model-SAEs/src/lm_saes/activation/factory.py", line 281, in process
streams = [processor(**kwargs) for processor in self.pre_aggregation_processors]
^^^^^^^^^^^^^^^^^^^
File "/Language-Model-SAEs/src/lm_saes/activation/factory.py", line 98, in process_dataset
assert datasets is not None, "`datasets` must be provided for dataset sources"
AssertionError: `datasets` must be provided for dataset sources
I tried using the dataset from Huggingface path Skylion007/openwebtext. However it still doesn't work. I think this is because I didn't download the dataset to my local repo. How and where can I download the dataset?
Have you tried the script above in this issue? The example script has not been up-to-date yet.
Hi, Yes, the error is from the new python script:
import torch
from lm_saes import (
ActivationFactoryConfig,
ActivationFactoryDatasetSource,
ActivationFactoryTarget,
InitializerConfig,
SAEConfig,
TrainerConfig,
TrainSAESettings,
WandbConfig,
train_sae,
)
if __name__ == "__main__":
settings = TrainSAESettings(
sae=SAEConfig(
hook_point_in="blocks.3.ln1.hook_normalized",
hook_point_out="blocks.3.ln1.hook_normalized",
d_model=768,
expansion_factor=8,
act_fn="topk",
norm_activation="token-wise",
sparsity_include_decoder_norm=True,
top_k=50,
dtype=torch.float32,
device="cuda",
),
initializer=InitializerConfig(
init_search=True,
state="training",
),
trainer=TrainerConfig(
lp=1,
initial_k=768 / 2,
lr=4e-4,
lr_scheduler_name="constantwithwarmup",
total_training_tokens=600_000_000,
log_frequency=1000,
eval_frequency=1000000,
n_checkpoints=5,
check_point_save_mode="linear",
exp_result_path="results",
),
wandb=WandbConfig(
wandb_project="pythia-160m-test",
exp_name="pythia-160m-test",
),
activation_factory=ActivationFactoryConfig(
sources=[
ActivationFactoryDatasetSource(
name="openwebtext",
)
],
target=ActivationFactoryTarget.BATCHED_ACTIVATIONS_1D,
hook_points=["blocks.3.ln1.hook_normalized"],
batch_size=2048,
buffer_size=None,
ignore_token_ids=[],
),
sae_name="pythia-160m-test-L3",
sae_series="pythia-160m-test",
)
train_sae(settings)
Can you please take a look and help me fix the bug?
Hi, It seems to be some bugs in the current train runner. It doesn't fit non-pre-generated datasets. I'll push a fix asap.
Thank you! Please update asap
Hello, this should be fixed with #85 . Also, you can try separately generating activations and training the SAE, which can drastically improve the training speed as long as you have enough disk space to hold all the activations. Examples are updated in #85 .