Force unsupported models

Open mrinaldi97 opened this issue 10 months ago • 1 comments

Adding the "force_unsupported_models" feature

Description

I added the option "force_unsupported_model" in HookedTransformer.from_pretrained. The new code simply skips the verification of the "official model names" by setting the parameter force_unsupported_model to True. Default parameter is False. A warning message is printed on the screen when this option is set.

The reason for this addition is that researchers from non-English based countries may be find useful to adopt TransformerLens library to perform interpretability studies with different models in their languages and these models are usually not included in the list of supported models. Because there are many languages in the world and many different monolingual models, I thought it would be hard to manually add all of these models into the list. Instead, this option allows the user to force the loading of unsupported models, something that could work perfectly if the model is based on a common architecture.

For example, thanks to this fix I was able to load several Italian models with no issues:

LorenzoDeMattei/GePpeTto (GPT2) sapienzanlp/Minerva-350M-base-v1.0 (Mistral) sapienzanlp/Minerva-1B-base-v1.0 (Mistral) sapienzanlp/Minerva-3B-base-v1.0 (Mistral) Almawave/Velvet-2B (Mistral)

Sometimes it doesn't work out of the box, for example with sapienzanlp/Minerva-7B-instruct-v1.0 (Mistral), but I still think this option can be valuable to do testing without editing everytime by hand the list of supported models.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

[ ] New feature (non-breaking change which adds functionality)
[ ] This change requires a documentation update

Screenshots

Checklist:

[X] I have commented my code, particularly in hard-to-understand areas
[ ] I have made corresponding changes to the documentation
[X] My changes generate no new warnings
[ ] I have added tests that prove my fix is effective or that my feature works
[ ] New and existing unit tests pass locally with my changes
[ ] I have not rewritten tests relating to key interfaces which would affect backward compatibility

Feb 23 '25 15:02 mrinaldi97

This doesn't entirely align with the direction of transformer lens the way this PR is currently structured. The roadmap for TransformerLens 4.0 is to add a number of hooks for the creation of plugins, which would more robustly solve the problem that you are trying to solve here. I am not entirely opposed to adding something like this for the time being, but I need to think about how I would want it structured, so that it helps us on our roadmap. If it is added without that in mind, then it may be removed in a few months, and I would rather add it in a compatible way now, so that people don't get used to functionality that disappears.

Would you maybe be available to help add this as the initial plugin support? This would entail adding a function named extend that takes in a plugin config dictionary. Then that dictionary would provide a little bit of information on additional models to support. When TransformerLens boots, it would then use that information to make more models available to the user. It's a little bit more effort then what you have done so far, but a boolean variable is not going to give us enough flexibility to make something like this work. More importantly, there are a number of potential fine tuning tasks that may need to be done for specific models, that are not possible at all with a boolean variable, and that can lead to hidden inaccuracies, which could cause research to become invalid.

If you are available to add this, then we can get this setup in a way where it could become very robust to everyone currently using TransformerLens. If you don't have the time to do that, then I would be happy to prioritize it, so that it can be made available ASAP.

Feb 24 '25 22:02 bryce13950