Swiglu Issue
Hey There, appreciate what you guys are doing, its great work. I'm trying to access the model weights from HF using transformer Library but stuck due to a swiglu error, any help regarding that would be really great, also secondly where can i find direct implementation of the attn-360 or 1.4b variant, i have 1 billion token dataset extracted from pile that i want to try an off the shelf training on attn-360 models!
Hi what is the error? The implementation configs are provided in train/configs/experiments/reference/
Hey, i think the swiglu issue is when you try to use GPT2LMHead for loading the model, nonetheless i shifted to load the model using this snippet: import torch from transformers import AutoTokenizer from based.models.transformer.gpt import GPTLMHeadModel
tokenizer = AutoTokenizer.from_pretrained("gpt2") model = GPTLMHeadModel.from_pretrained_hf("hazyresearch/attn-360m").to("cuda")
My question here is can i use these weights directly to test this model accuracy or lets say perplexity on PILE test set, just inference and testing in eval mode?
@simran-arora @seyuboglu Really sorry for pinging you guys here, but can you guide me a little bit on this?
Not sure how you got this, but I saw this error b/c the script was trying to load the HuggingFace transformers attention file rather than the based version.
The fix was explicitly installing based per the readme:
# clone the repository
git clone [email protected]:HazyResearch/based.git
cd based
# install torch
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118 # due to observed causal-conv1d dependency
# install based package
pip install -e .
and making sure you use from based.models.gpt import GPTLMHeadModel instead of a generic transformers.AutoModel
There were a few other errors I got, but they were all fixed by following the installs listed on this other Issue: https://github.com/HazyResearch/based/issues/3#issuecomment-1979312070