Nikhil Gupta

Results 28 comments of Nikhil Gupta

Hello @jrudolph , Can you please help me understand ggml matmul execution wrt quantization. is it input_float32 * Quantized_weights -> Output_float32 or Quantize(input_float32) * Quantized_Weights -> Output_float32 -> Quantize(Output_float32) for...

Thanks for your reply. When you say one dimensional vector, are you talking about the input 1d vector ? Sorry If this is basic question. Thanks On Sun, Aug 13,...

If I understood it correctly, then it means for mat mul we have to quantize the input 1d array. I am wondering if latency to quantize this vector can surpass...

> you can add `--export_test` to verify the onnx model. `--export_test` will run onnx with onnxrumtime and compare with torch. Yes, Thank you for your response. I tried --export_test and...

``` onnx test SUCCESS Don't has bizCode, use MNNTest for default Start to Convert Other Model Format To MNN Model..., target version: 2.8 [15:30:40] :46: ONNX Model ir version: 8...

My custom model with extended vocab config.json file ``` { "architectures": [ "LlamaForCausalLM" ], "auto_map": { "AutoModelForCausalLM": "modeling_llama.LlamaForCausalLM" }, "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size":...

Hey @wangzhaode , Can you give me some clue what should I fix to get my GQA model with large vocab size working?

> Can you convert large vocab size `lm.onnx` to `lm.mnn` ? Yes I can Convert the lm.onn to lm.mnn lm.mnn lm.onnx

I dumped my block_0.mnn to block_0.json. I can see that the bias values are all 0.0 in my json. While the bias values are non zero in qwen1.8B block_0 json...

I think I might have found the issue. @wangzhaode All blocks_#.mnn should be of same size right? My block_0.mnn and block_1.mnn and block_21.mnn are all of diff size. When I...