Dominic23331 issues

Results 8 issues of


                                            Dominic23331

[enhance] export image encoder model to onnx

In some scenarios, the image encoder may need to export onnx. I attempted to export it and deploy it using onnxruntime, which showed a significant improvement in speed. Can I...

Add image encoder onnx export

This allow export image encoder to onnx. I create a new file to export it.

CLA Signed

[Feature] support ghostnet

### Describe the feature I suggest to support ghostnet: https://arxiv.org/abs/1911.11907 ### Will you implement it? - [X] I would like to implement this feature and create a PR!

Nvidia Jetson Nano can not use GPU to run ollama

### What is the issue? I use Jetson Nano to run ollama, but when I check jtop, I find my GPU is not work, the ollama is running in CPU....

bug

nvidia

gpu

我在训练时输出以下内容后，程序就停止了，请问这种情况该如何解决？ `2024-05-15 09:29:44.939294: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-05-15 09:29:44.939347: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register...

在树莓派3b+上部署，出现无法打开模型的问题

我在树莓派3b+上运行chatglm2，无法打开模型，请问这种情况怎么解决？ `main: seed = 1715851844 Assert ' m_file ' failed at file : /home/dominic/project/InferLLM-main/src/file.cpp line 10 : inferllm::InputFile::InputFile(const std::string &, bool), extra message: Failed to open model file.Aborted `

Why do the output results of flash attention and muti head attention differ significantly under the same parameters

This is the flash attention code that I have encapsulated ` # flash-attention import math import torch import torch.nn as nn from torch.nn.init import ( xavier_uniform_, constant_, xavier_normal_ ) from...

Does Flash Attention Support Custom Masks

I want to use a custom attention mask. Does flash attention support it? What should I do?