Nikhil Gupta comments

Results 28 comments of


                                            Nikhil Gupta

Quantization Brainstorming

Hello @jrudolph , Can you please help me understand ggml matmul execution wrt quantization. is it input_float32 * Quantized_weights -> Output_float32 or Quantize(input_float32) * Quantized_Weights -> Output_float32 -> Quantize(Output_float32) for...

Quantization Brainstorming

Thanks for your reply. When you say one dimensional vector, are you talking about the input 1d vector ? Sorry If this is basic question. Thanks On Sun, Aug 13,...

Quantization Brainstorming

If I understood it correctly, then it means for mat mul we have to quantize the input 1d array. I am wondering if latency to quantize this vector can surpass...

[Request]: Help in adding support for Models with Grouped Query Attention (GQA)

> you can add `--export_test` to verify the onnx model. `--export_test` will run onnx with onnxrumtime and compare with torch. Yes, Thank you for your response. I tried --export_test and...

[Request]: Help in adding support for Models with Grouped Query Attention (GQA)

``` onnx test SUCCESS Don't has bizCode, use MNNTest for default Start to Convert Other Model Format To MNN Model..., target version: 2.8 [15:30:40] :46: ONNX Model ir version: 8...

[Request]: Help in adding support for Models with Grouped Query Attention (GQA)

My custom model with extended vocab config.json file ``` { "architectures": [ "LlamaForCausalLM" ], "auto_map": { "AutoModelForCausalLM": "modeling_llama.LlamaForCausalLM" }, "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size":...