Add `allenai/OLMoE-1B-7B-0924`.
Add allenai/OLMoE-1B-7B-0924
This is a new MoE model which I'd like to use with TL. Notes:
transformershasn't released a version with OlMoE support yet. We can updatepyproject.tomlto point to it instead of github once it's released. Will leave as a draft until then.- There are three features which I haven't implemented yet. You can see traces where they're commented-out in the code. I don't plan on using them and am inclined to not include them for now.
router_aux_loss_coef/router_z_loss_coef: I don't plan on training OLMoE in TL so there's no need for these coefficients.norm_topk_probdefaults toFalseintransformersand I don't plan to use it.
Commenting-out add_bos_token=True
This is a temporary fix. When running without either location commented out:
joel@simplex ~/c/g/TransformerLens (add-olmoe) [1]> python3 test.py
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████| 3/3 [00:47<00:00, 15.95s/it]
Traceback (most recent call last):
File "/Users/joel/code/github/TransformerLens/test.py", line 4, in <module>
model = transformer_lens.HookedTransformer.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/joel/code/github/TransformerLens/transformer_lens/HookedTransformer.py", line 1300, in from_pretrained
model = cls(
^^^^
File "/Users/joel/code/github/TransformerLens/transformer_lens/HookedTransformer.py", line 146, in __init__
AutoTokenizer.from_pretrained(
File "/Users/joel/code/github/TransformerLens/.venv/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 901, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/joel/code/github/TransformerLens/.venv/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2214, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "/Users/joel/code/github/TransformerLens/.venv/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2448, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/joel/code/github/TransformerLens/.venv/lib/python3.11/site-packages/transformers/models/gpt_neox/tokenization_gpt_neox_fast.py", line 123, in __init__
self.update_post_processor()
File "/Users/joel/code/github/TransformerLens/.venv/lib/python3.11/site-packages/transformers/models/gpt_neox/tokenization_gpt_neox_fast.py", line 159, in update_post_processor
raise ValueError("add_bos_token = True but bos_token = None")
ValueError: add_bos_token = True but bos_token = None
Commenting out the location mentioned in the stack trace (HookedTransformer.py:146):
joel@simplex ~/c/g/TransformerLens (add-olmoe) [1]> python3 test.py
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████| 3/3 [00:46<00:00, 15.33s/it]
Traceback (most recent call last):
File "/Users/joel/code/github/TransformerLens/test.py", line 4, in <module>
model = transformer_lens.HookedTransformer.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/joel/code/github/TransformerLens/transformer_lens/HookedTransformer.py", line 1300, in from_pretrained
model = cls(
^^^^
File "/Users/joel/code/github/TransformerLens/transformer_lens/HookedTransformer.py", line 145, in __init__
self.set_tokenizer(
File "/Users/joel/code/github/TransformerLens/transformer_lens/HookedTransformer.py", line 677, in set_tokenizer
tokenizer_with_bos = utils.get_tokenizer_with_bos(tokenizer)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/joel/code/github/TransformerLens/transformer_lens/utils.py", line 1172, in get_tokenizer_with_bos
tokenizer_with_bos = AutoTokenizer.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/joel/code/github/TransformerLens/.venv/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 901, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/joel/code/github/TransformerLens/.venv/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2214, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "/Users/joel/code/github/TransformerLens/.venv/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2448, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/joel/code/github/TransformerLens/.venv/lib/python3.11/site-packages/transformers/models/gpt_neox/tokenization_gpt_neox_fast.py", line 123, in __init__
self.update_post_processor()
File "/Users/joel/code/github/TransformerLens/.venv/lib/python3.11/site-packages/transformers/models/gpt_neox/tokenization_gpt_neox_fast.py", line 159, in update_post_processor
raise ValueError("add_bos_token = True but bos_token = None")
ValueError: add_bos_token = True but bos_token = None
I'd appreciate advice on what's going wrong here. I'm a bit confused because I didn't change anything related to bos tokens (and e.g. the call to AutoTokenizer.from_pretrained in HookedTransformer always specifies add_bos_token=True but never bos_token).
Type of change
- [x] New feature (non-breaking change which adds functionality)
Checklist:
- [x] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes
- [x] I have not rewritten tests relating to key interfaces which would affect backward compatibility
It looks like Compatibility Checks (3.9) failed because of incompatible numpy versions.
There are a lot of issues in this pr due to dependency bumping. None of that has anything to do with what has been done here, but there are general issues at the moment with dependency versions. I started messing with it in this PR. In order to add these models officially, we probably need to get that resolved first. I will prioritize it a bit further up the line in order to allow you to finish what you are doing.
@joelburget I am working on https://github.com/jonasrohw/TransformerLens/tree/OLMo; I think your MoE is very similar. I found the issue you were facing: the tokenizer is called again after tokenizer_with_bos = utils.get_tokenizer_with_bos(tokenizer). Maybe you can merge your MoE implementation into this code? I am looking at OLMo-v2 now, and then we could ship it all together. WDYT?
Hey @jonasrohw, thanks for looping me in. Your code looks much more complete than mine, so I want to make sure I understand the bit that you're suggesting we merge in (and how). The two things this implementation has that yours doesn't:
- The change in
transformer_lens/components/mlps/moe.py - Disabling
add_bos_tokenin a few places.
Are you suggesting I merge my transformer_lens/components/mlps/moe.py into your branch?
@joelburget Exactly. You can also conditionally add the MoE weights import into the Olmo file. You could include your model names, etc., in the preloading with the exact model configurations for MoE.
Thanks @jonasrohw. I opened https://github.com/jonasrohw/TransformerLens/pull/1. I still need to finish the one TODO and do testing but I can hopefully finish this weekend.
Closing this because #816