bolt
bolt copied to clipboard

Published 20 hours ago •

→

Metadata

Bolt is a deep learning library with high performance and heterogeneous flexibility.

Reame
Issues

Results 44 bolt issues

Sort by recently updated

version 1.2.1 and 1.3.0 issues

7

comment

Hello, thank you for your team’s awesome work! I have some questions about using the bolt framework. Here's my working environment: - target platform: Android-aarch64 - build platform: Linux -...

TinyBert模型经过post_training_quantization进行INT8量化后，在Linux_X86-64平台推理报错

4

comment

1. X2bolt -d onnx -m model -i PTQ #输出为model_ptq_input.bolt 2. ./post_training_quantization -p model_ptq_input.bolt -i INT8_FP32 -b true -q NOQUANT -c 0 -o false 3. 推理报错如下： [ERROR] thread 121948 file /home/xxx/project/bolt/compute/tensor/src/fully_connected.cpp...

是否支持BGEMM？

3

comment

请问二值化的linear和matmul操作是否支持？

arm cpu dilated conv遇到nchw类型的输入会出错

CONVOLUTION_ALGORITHM_GEMM不支持nchw排布的输入，如果模型第一层是dilated conv，选到CONVOLUTION_ALGORITHM_GEMM会计算错误：https://github.com/huawei-noah/bolt/blob/master/compute/tensor/src/cpu/arm/convolution.cpp#L72

展开OCL kernel中的标量dot操作可以获得更高的GFLOPs

2

comment

展开前： ``` c #define DOT_A4B16C4(a, b, c) \ { \ c.x += (a.x * b.s0 + a.y * b.s1 + a.z * b.s2 + a.w * b.s3); \ c.y +=...

如何设置运行时浮点精度为fp16

4

comment

bolt的开发人员你们好，我在使用的过程中遇到如下问题，希望你们可以抽空帮忙看一下问题1:在使用C API时，我发现并没有多少示例代码，而且我根据API也没有发现设置运行时浮点精度的方式（即模型为fp32的，但是按照fp16的精度计算），难道只有将模型转换为fp16的才能跑fp16代码吗？问题2:如果只有fp16的模型才能跑fp16代码，那么请问如何设置输入，因为fp16的tensor也是fp16的，难道需要外部给进去fp16的数据吗？

How to build on Raspberry?

We have sucessully build bolt inference library without model converter on Raspberry 3 model B(armv7). #67 ```bash export CFLAGS="-march=armv7-a -mfpu=neon-vfpv4 " export CXXFLAGS="-march=armv7-a -mfpu=neon-vfpv4 " ./install.sh --target=linux-armv7_blank --converter=off -t 4...

benchmark issue

8

comment

对bolt进行了benchmark测试，install 阶段也关闭了 profile功能，只看模型总耗时，发现达不到文章里提到的性能，不知道是我哪里用错了，请帮忙看一下如图所示 ![WXWorkCapture_16431661593291](https://user-images.githubusercontent.com/46490038/151124923-a8f3408c-670f-486b-b241-819c755243b5.png) 文章里提到对squeezent1.1在高通888 half情况下耗时为3.949ms，我在小米11 高通888实测fp16case耗时为avg_time:7.443091ms/data；为了验证，我实际测试了一下 https://github.com/huawei-noah/bolt/blob/master/docs/USER_HANDBOOK.md中提到的 resnet50这个网络，利用X2BOLT工具，我的命令如下./benchmark -a GPU -w 10 -l 10 -m ResNet-50_f16.bolt ![WXWorkCapture_16431838464497](https://user-images.githubusercontent.com/46490038/151124496-efec7859-e742-4206-9f26-c07aa2b4a55a.png) 高通888fp16耗时情况为 Benchmark Result: Output Tensor prob desc: dt:DT_F16 memFormat:DF_NCHW stride(1000,1,1)...

need update build script for linux aarch64

1

comment

fix build error on linux aarch64 Using `bash ./install.sh --target=linux-aarch64` builds this lib will cause a error. It will download protocolbuffers from https://github.com/protocolbuffers/protobuf/releases/download/v3.1.0/protoc-3.1.0-.zip.

steelONIONknight

using netron to visualise bolt model

1

comment

1
2
3
4
5
›

About

Bolt is a deep learning library with high performance and heterogeneous flexibility.

android

deep-learning

tensorflow

ios

arm

x86

nlp

mobile

cv

onnx

cnn

rnn

high-performance

inference

bolt

caffe

huawei

mali

noah

902

Stars

156

Forks

Watchers

Owner

← Metadata

902

Stars

156

Forks

Watchers

Owner

Metadata

Bolt is a deep learning library with high performance and heterogeneous flexibility.