Nikhil Gupta
Nikhil Gupta
I cross Checked the **llama2 7b** model , I was expecting that size of the blocks should be same as each block is exactly the same model with same dimension...
Okay so Type 1 and Type 2 Quantization for Convolutions are based on Sparse vs Compressed blob size for quantized weights and I am guessing MNN has both the implementations...
I see that you have added the support for TinyLlama conversion. This is exactly the way I did and got it working. My multilingual model is trained right now. I...
> > I see that you have added the support for TinyLlama conversion. This is exactly the way I did and got it working. My multilingual model is trained right...
Understood, Thanks. I am happy that this bug was caught and fix is in progress. _I am just guesing that size of the data structure storing the embedding weight is...
@wangzhaode I tried finding the overflow in MNN model export as per your advise. I was not able to locate an overflow as vocab_size * hidden_size * sizeof(float) < INT_MAX...
@wangzhaode any help or guidance regarding this? > @wangzhaode I tried finding the overflow in MNN model export as per your advise. I was not able to locate an overflow...
I printed the all input to my ArgMax layer for lm head. The maximum value is always at index "1" outof 160984 vocab size. This is very weird and I...