Proposal

Add Llama 3.1 support. Currently trying to load it fails with:

ValueError: meta-llama/Meta-Llama-3.1-8B-Instruct not found. Valid official model names (excl aliases):

Motivation

Llamma 3.1 is new bestest model, they say it's smarter than Rey Palpatine.

[X] I have checked that there is no similar issue in the repo (required)

Jul 31 '24 17:07 ssuukk

It works already with llama 3.1 you just have to put it into a folder with the name of the previous llama like so. `MODEL_ID = "meta-llama/Meta-Llama-3-70B-Instruct" NEW_MODEL_ID = "mylesgoose/Meta-Llama-3.1-70B-abliterated" MODEL_TYPE = "meta-llama/Meta-Llama-3-70B-Instruct"

Load model and tokenizer on CPU first to avoid memory issues

model = HookedTransformer.from_pretrained_no_processing( MODEL_TYPE, local_files_only=True, dtype=torch.bfloat16, default_padding_side='left', device="cpu" ) tokenizer = AutoTokenizer.from_pretrained(MODEL_TYPE) tokenizer.padding_side = 'left' tokenizer.pad_token = tokenizer.eos_token` You can add the new model directly where you find the other models in that class. for example when you run the code and it gives you an error. Click on the url for the error message and edit the page to include the details for the new model. so basically just copy and paste the contents of Meta-Llama-3.1-70B into the 3 folder and run from local files as above. https://huggingface.co/blog/mlabonne/abliteration as per here. in the elink or otherwise In the file loading_from_pretrained.py convert_hf_config add this followign the schema of the already establised models "CodeLlama-7b-Instruct-hf", "meta-llama/Meta-Llama-3-8B", "meta-llama/Meta-Llama-3.1-8B-Instruct", "meta-llama/Meta-Llama-3-8B-Instruct", "meta-llama/Meta-Llama-3-70B", "meta-llama/Meta-Llama-3-70B-Instruct", "meta-llama/Meta-Llama-3.1-70B-Instruct", and then add two more of these modifed for the llama models. elif "Meta-Llama-3.1-8B" in official_model_name: cfg_dict = { "d_model": 4096, "d_head": 128, "n_heads": 32, "d_mlp": 14336, "n_layers": 32, "n_ctx": 8192, "eps": 1e-5, "d_vocab": 128256, "act_fn": "silu", "n_key_value_heads": 8, "normalization_type": "RMS", "positional_embedding_type": "rotary", "rotary_adjacent_pairs": False, "rotary_dim": 128, "final_rms": True, "gated_mlp": True, } elif "meta-llama/Meta-Llama-3.1-70B" in official_model_name: cfg_dict = { "d_model": 8192, "d_head": 128, "n_heads": 64, "d_mlp": 28672, "n_layers": 80, "n_ctx": 8192, "eps": 1e-5, "d_vocab": 128256, "act_fn": "silu", "n_key_value_heads": 8, "normalization_type": "RMS", "positional_embedding_type": "rotary", "rotary_adjacent_pairs": False, "rotary_dim": 128, "final_rms": True, "gated_mlp": True, } you can probably change the context if you like. but I dont think its requred here as when you change the model into the same folder as the other one it loads perfectly on the identical settings.

Aug 07 '24 23:08 mylesgoose

loading_from_pretrained.zip

Aug 07 '24 23:08 mylesgoose

Adding it to loading_from_pretrained.py fails with:

  File "E:\Projekty\_AI\abliteration\main.py", line 66, in <module>
    model = HookedTransformer.from_pretrained_no_processing(
  File "C:\Users\ssuuk\anaconda3\envs\rope2\lib\site-packages\transformer_lens\HookedTransformer.py", line 1331, in from_pretrained_no_processing
    return cls.from_pretrained(
  File "C:\Users\ssuuk\anaconda3\envs\rope2\lib\site-packages\transformer_lens\HookedTransformer.py", line 1243, in from_pretrained
    cfg = loading.get_pretrained_model_config(
  File "C:\Users\ssuuk\anaconda3\envs\rope2\lib\site-packages\transformer_lens\loading_from_pretrained.py", line 1454, in get_pretrained_model_config
    cfg_dict = convert_hf_model_config(official_model_name, **kwargs)
  File "C:\Users\ssuuk\anaconda3\envs\rope2\lib\site-packages\transformer_lens\loading_from_pretrained.py", line 1074, in convert_hf_model_config
    "d_model": hf_config.hidden_size,
UnboundLocalError: local variable 'hf_config' referenced before assignment

Aug 08 '24 07:08 ssuukk

if i look at my loading _from_pretrain.py file the lines do not match up to the errors your getting above. is it differnt on windows to linux? Try that file folder swap thing i showed above. and restore your loading..py file back to original. and load from directory as i explained above. first try to load the original version 3 model from disk and then if it loads witthin that same folder paste the contents of llamam3.1 and try run your same script again. it 100% works to load a 3.1 llama model from a 3.0 llama folder on ubuntu linux. try in a wsl2 environemnt maybe if that fails as i see your saying C drive. if you take a look at this hugginface repo you can see he has even used this script to make a 3.1 model via loading with hooked transfomers. and he did the directory trick also as he explained in his artical. I tried it and it worked."local_files_only=True," https://huggingface.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated here is the code i can run wihtout any modifications to the py file to add new models. jsut swapping the new model to the old models folder and loading form local not form huggin face cache. MODEL_ID = "meta-llama/Meta-Llama-3-70B-Instruct" NEW_MODEL_ID = "mylesgoose/Meta-Llama-3.1-70B-abliterated" MODEL_TYPE = "meta-llama/Meta-Llama-3-70B-Instruct"

Load model and tokenizer on CPU first to avoid memory issues

model = HookedTransformer.from_pretrained_no_processing( MODEL_TYPE, local_files_only=True, dtype=torch.bfloat16, default_padding_side='left', device="cpu" ) tokenizer = AutoTokenizer.from_pretrained(MODEL_TYPE) tokenizer.padding_side = 'left' tokenizer.pad_token = tokenizer.eos_token

Aug 08 '24 10:08 mylesgoose

MODEL_PATH = 'meta-llama/Meta-Llama-3-8B-Instruct'

little hack/tip:

if you're dealing with a fine-tuned model of a "supported" model by transformer lens

you can replicate the 'model path' of the supported model in your working directory

e.g. rename the folder of 'dolphin-2.9-llama3-8b' to 'Meta-Llama-3-70B-Instruct', and put that into a folder called 'meta-llama'

now transformers will accept 'meta-llama/Meta-Llama-3-70B-Instruct' as the model path for the model you're using, AND you don't have to add the model name to HookedTransformers

make sure the model architecture and configs really do match though!

model = HookedTransformer.from_pretrained_no_processing( MODEL_PATH, #local_files_only=True, # you can use local_files_only=True as a kwarg to from_pretrained_no_processing to enforce using the model from a local directory dtype=torch.bfloat16, # you may want to try full precision if you can. bfloat16 is a good compromise though, but may not work in certain conditions or on certain hardware. DYOR default_padding_side='left' ) https://huggingface.co/failspy/llama-3-70B-Instruct-abliterated/blob/main/ortho_cookbook.ipynb in this situation he is loading a renamed llama model of the same type however it does work with the llama3.1 model

Aug 08 '24 10:08 mylesgoose

Hi @mylesgoose, I'm interested in following the steps you outlined to make this work with Llama-3.1-8B, but I'm having difficulty understanding which files you're suggesting we edit and the formatting is a bit unclear.

I understand replacing the loading_from_pretrained.py with the one you shared, but what are the other files you're referring to?

Thanks in advance for any help!

Aug 16 '24 16:08 LeeSomm

To resolve the issue using the simplest approach without editing any files, follow these steps:

Load LLaMA3 Version: Ensure that your script or environment is set up to use the LLaMA3 directory for initial loading. This is where your code will read the configurations and models.

Local Files Only: Confirm that your script or setup only uses local files and does not attempt to download or fetch any additional data from external sources.

Replace Files:

Once you have successfully loaded the LLaMA3 version, go to the LLaMA3 directory. Delete all existing contents in the LLaMA3 directory. Copy all contents from the LLaMA3.1 folder into the now-empty LLaMA3 directory. Trick the Script: By replacing the contents in the LLaMA3 directory with those from the LLaMA3.1 folder, you will trick the script into thinking it’s loading the LLaMA3 version while actually using the tokenizer details and other configurations from the LLaMA3.1 version.

Update Script Parameters:

If using local_files_only=True as a kwarg in from_pretrained_no_processing enforces local model usage, test it by loading the LLaMA3 version first. Once confirmed, apply the replacement method described above. Keep file names in the script corresponding to LLaMA 3.0, for example:

MODEL_ID = "meta-llama/Meta-Llama-3-70B-Instruct" NEW_MODEL_ID = "mylesgoose/Meta-Llama-3.1-70B-abliterated" MODEL_TYPE = "meta-llama/Meta-Llama-3-70B-Instruct" This way, the script will show the model type as LLaMA 3 but will actually use the replaced LLaMA 3.1 files. This approach should allow your script to function as if it's using LLaMA3, but with the updated details from LLaMA3.1.

Aug 17 '24 00:08 mylesgoose

@mylesgoose Are you interested in opening a PR, and adding the family officially? I am happy to do it, but you already did the work to support it, and I don't want to step on your toes when you could be credited officially on the repo.

Sep 30 '24 20:09 bryce13950

I think it already supported. the 70 b models and the 3.2 models. as I ran a script yesterday and checked inside the model template and it has been updated. to include 3.2 models. I am currently working on tye vision models. which managed ro get to work for the text model so far. on 3.2 11b. will do pr when finished. cheers

Sep 30 '24 21:09 mylesgoose

@mylesgoose 3.1 models are not currently supported. 3.2 are supported 3.1 are not. If you want to add them, I can put a release up quickly to add 3.1.

Sep 30 '24 22:09 bryce13950

okay I'll do it for you tomorrow. I can see that the 3.1 8b and 70b models are not there. your right. the new ones where 3.2 1b and 3b . I have managed to load the 8b and 70b models 3.1 and hook them. I also managed to hook the vision model 11b 3.2 today but was outputting carbage but making progress. how can I modify the files here. do you want me to modify another repo then do pull requests

Oct 01 '24 09:10 mylesgoose

Yep! Are you on the open source slack? I can get you on there if you like, and we can setup a call today to go through the process. The vision models are probably going to be slightly more complicated. I am working with someone on LLaVA right now, and honestly we are not sure the way to proceed for adding different types of models to TransformerLens. It is an open debate at the moment, with myself preferring the idea of turning TransformerLens into a platform with a way to extend the base library for things like visions models, which would in turn keep TransformerLens itself purely focused on making interpretability of LLMs as good as it can be. It's an open debate though. For the time being, if you can get a vision model to load with minimal fuss, then I think we would be happy to add the support. It may take more than what it appears on the surface though. If you need help let me know!

Oct 01 '24 13:10 bryce13950

I made a couple of llava models. so a bit familiar with them. well with regards to the open source slack. I don't know about it. with regards to the repo. the issue with the multi modal was I had to pass the cfg into hooketransfomer config and I tried for days to flatten the nested confg out. but as there is duplicated keys for text config and vision. models it was a nightmare. eventually I settled for diverting hooked transfomer if it detected it was a nested text vision model in config for a new hookedVisiontransformerconfig class. so this what I'm working on tomorrow. and I actually managed to get a new abliterated model saved from the process but it only saved the text part of the model. so I have had to rethink and learn more. and I went into the core mllama transformers library and found gold mine haha. it's pretty well explained how to do everything. the only problem with the way I'm doing it is if you want to add support for other vision models you would then have to add a new clause into the hooked transformer config to pass that to the second llava visionhooked transfomer class. and I just passed the entire cfg from the model declaration page for the vision model to the htconfig. and Delt with it there. one confusing thing with the vision models is the weights change labels depending on how deep in the trasnfomer library it's getting. so for example it changed from model.weights... to language_model.weights. vision_model... etc they then pass it again down another layer. quite confusing.

Oct 01 '24 15:10 mylesgoose

Proposal

Add Llama 3.1 support. Currently trying to load it fails with:

ValueError: meta-llama/Meta-Llama-3.1-8B-Instruct not found. Valid official model names (excl aliases):

Motivation

Llamma 3.1 is new bestest model, they say it's smarter than Rey Palpatine.

[x] I have checked that there is no similar issue in the repo (required)

https://github.com/mylesgoose/TransformerLens @ssuukk @bryce13950

Oct 02 '24 12:10 mylesgoose

It would be great if TransformerLens supports vision models as well. There are abliterated recent vision models, but I'm not sure how they are done. There seems to be a method to remove refusals with just Transformers, but it didn't work with llama-3.2-vision. I'd appreciate any tip to get llama-3.2-vision models to work with FailSpy/abliterator, or Sumandora/remove-refusals-with-transformers. Thanks!

Jan 05 '25 03:01 chigkim

@chigkim you can just add the vision from the model your trying to modify, to the base model after you have abilated the base model.

Jan 05 '25 04:01 mylesgoose

Llama-3.2-vision model seems to come as one combined safetensors. https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/tree/main @mylesgoose Do you know how to separate language and vision and combine them together?

Jan 05 '25 15:01 chigkim

Here @chigkim

Jan 05 '25 19:01 mylesgoose

https://huggingface.co/bababababooey/llama-3.2-11b-vision-instruct-stheno-abliterated/resolve/main/swapper/32to31.py

Jan 05 '25 19:01 mylesgoose

Wow, thanks @mylesgoose for the script! I was successfully able to extract 8b language model from 11b vision language. Now I need to find a script that can abliterate it. I ran the script from Sumandora/remove-refusals-with-transformers/ and saved into safetensor with no error. However, when I tried to load the modified 8b language model with transformers and chat, it works like the original model and refuses non-safe requests. I guess more puzzle to figure out, but thanks again!

Jan 06 '25 14:01 chigkim

[Proposal] Add Lllama 3.1 support

Proposal

Motivation

Load model and tokenizer on CPU first to avoid memory issues

Load model and tokenizer on CPU first to avoid memory issues

little hack/tip:

if you're dealing with a fine-tuned model of a "supported" model by transformer lens

you can replicate the 'model path' of the supported model in your working directory

e.g. rename the folder of 'dolphin-2.9-llama3-8b' to 'Meta-Llama-3-70B-Instruct', and put that into a folder called 'meta-llama'

now transformers will accept 'meta-llama/Meta-Llama-3-70B-Instruct' as the model path for the model you're using, AND you don't have to add the model name to HookedTransformers

make sure the model architecture and configs really do match though!

Proposal

Motivation