Nikhil Gupta comments

Results 28 comments of


                                            Nikhil Gupta

Could this benefit 'TensorFlow Lite for MicroControllers' models

> Hi @NicoJuicy. We are not currently supporting TF Lite, but this can definitely be an interesting feature to include in the future! In the coming days we will draw...

tflite模型转换时Check failed: customOPCode == "TFLite_Detection_PostProcess" ==> Now Only support Custom op of 'TFLite_Detection_PostProcess'

I am facing the same issue with one of the block of llm model that I am trying to convert

[User] nonsense responses with q2_k llama in Termux when using GPU

Hello, were you able to fix this issue?

[User] nonsense responses with q2_k llama in Termux when using GPU

> More on this: recent koboldcpp build, snapdragon 8 Gen 1, termux. > > Any quant is garbled at GGUF model. k quant or not. Offloaded layers or not. GGML...

error: array type has incomplete element type xnn_timestamp (struct timespec branch)

Hello @alankelly @wei-v-wang , How can we fix this issue if we are sticking to ubuntu 16 & gcc 5.4.0 ? I have tried #define _POSIX_C_SOURCE 199309L as suggested by...

[Bug] LongT5 Error: failed to call OrtRun(). error code = 6 with short inputs

Hey @naveengovind did you manage to get a fix yet? I am facing the exact same issue right now for my usecase. @xenova Do you have any further inputs on...

Prefill Processing

the X activation is fed back for multiple layers after being changed with help of attention output. and that's why attention is needed I guess for input tokens as well.

Prefill Processing

We can definitely avoid ` rmsnorm(x, x, w->rms_final_weight, dim); // classifier into logits matmul(s->logits, x, w->wcls, p->dim, p->vocab_size);` for input tokens . It will give some perf bump

[Feature Request] Do you have any plan to support CPU backend on Android devices?

What is the perf that your are getting on TVM CPU & TVM GPU backend. If you Arm Compute Library implementaion is ready, can you please share its perf as...

ggml-qnn: add Qualcomm QNN(Qualcomm Neural Network,aka Qualcomm AI Engine Direct) backend

Hello Does the matmul implementation support all the quantizations ( Q8_0 , Q4_0) on QNN ? Did we check the accuracy of the matmul ?