Mayank Mishra
Mayank Mishra
I think your environment is configured with CUDA 11.1 and torch is compiled using 10.2. Can you install torch using the same CUDA version?
The dockerfile works out of the box. Can you give it a shot?
Not sure why 176b is not working. I will try to look into it :)
Huh? Int4? I will test this branch surely and let you know. Thanks a lot for this :)
hey this is awesome
@sroecker are you tying the word embeddings? unlike llama, the input word embeddings and output projection matrix are tied for granite models
the lab version is a different model not to be confused with this one
Hmm, a quick question: are we tying the word embeddings and output logits matrix? llama doesn't do that and granite has tied embeddings. maybe thats the issue? I don't think...
Hmm, ok so there are these differences between llama and granite: 1. attention has bias (llama doesn't) 2. mlp has bias (llama doesn't) 3. tied word embeddings (llama doesn't) 4....
yeah all are using starcoder tokenizer.