based Swiglu Issue

Hey There, appreciate what you guys are doing, its great work. I'm trying to access the model weights from HF using transformer Library but stuck due to a swiglu error, any help regarding that would be really great, also secondly where can i find direct implementation of the attn-360 or 1.4b variant, i have 1 billion token dataset extracted from pile that i want to try an off the shelf training on attn-360 models!

Aug 02 '24 11:08 AsadMir10

Hi what is the error? The implementation configs are provided in train/configs/experiments/reference/

Sep 16 '24 02:09 simran-arora

Hey, i think the swiglu issue is when you try to use GPT2LMHead for loading the model, nonetheless i shifted to load the model using this snippet: import torch from transformers import AutoTokenizer from based.models.transformer.gpt import GPTLMHeadModel

tokenizer = AutoTokenizer.from_pretrained("gpt2") model = GPTLMHeadModel.from_pretrained_hf("hazyresearch/attn-360m").to("cuda")

My question here is can i use these weights directly to test this model accuracy or lets say perplexity on PILE test set, just inference and testing in eval mode?

Oct 07 '24 14:10 AsadMir10

@simran-arora @seyuboglu Really sorry for pinging you guys here, but can you guide me a little bit on this?

Oct 15 '24 12:10 AsadMir10

Not sure how you got this, but I saw this error b/c the script was trying to load the HuggingFace transformers attention file rather than the based version.

The fix was explicitly installing based per the readme:

# clone the repository
git clone [email protected]:HazyResearch/based.git
cd based

# install torch
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118 # due to observed causal-conv1d dependency

# install based package
pip install -e .

and making sure you use from based.models.gpt import GPTLMHeadModel instead of a generic transformers.AutoModel

Nov 19 '24 08:11 Miking98

There were a few other errors I got, but they were all fixed by following the installs listed on this other Issue: https://github.com/HazyResearch/based/issues/3#issuecomment-1979312070

Nov 20 '24 06:11 Miking98