Jonas Rohw
Results
2
comments of
Jonas Rohw
@joelburget I am working on https://github.com/jonasrohw/TransformerLens/tree/OLMo; I think your MoE is very similar. I found the issue you were facing: the tokenizer is called again after `tokenizer_with_bos = utils.get_tokenizer_with_bos(tokenizer)`. Maybe...
@joelburget Exactly. You can also conditionally add the MoE weights import into the Olmo file. You could include your model names, etc., in the preloading with the exact model configurations...