Shaquille.Wu
Shaquille.Wu
小弟在用QAT量化模型时遇到一些性能瓶颈,想请阿里的大神们给解释一下 如果直接使用quantized.out对原始.mnn文件进行量化,生成量化模型,这个没问题 如果参照README的关于QAT的方法进行量化训练的话(就是参照MobileNetV2Utils.cpp的代码),正常生成量化模型,但是会生成一堆多余的op: FloatToInt8->Scale->Int8ToFloat,导致最后量化模型的运算速度还不如fp32的模型。按说这堆FloatToInt8/Scale/Int8ToFloat都可以合并掉,不知道有什么方法能合并?
Hello, every cutlass experts, I'm confused by the implementation of Semaphore. its "fetch" like this: ```C++ if (wait_thread) { #if defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 700 asm volatile ("ld.global.acquire.gpu.b32 %0, [%1];\n"...
petr这个模型需要用到图像坐标系到lidar坐标系的转换,apollo源码中也提供了相关的参数,具体如下: std::vector k_data_{ -1.40307297e-03, 9.07780395e-06, 4.84838307e-01, -5.43047376e-02, -1.40780103e-04, 1.25770375e-05, 1.04126692e+00, 7.67668605e-01, -1.02884378e-05, -1.41007011e-03, 1.02823459e-01, -3.07415128e-01, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.00000000e+00, -9.39000631e-04, -7.65239349e-07, 1.14073277e+00, 4.46270645e-01, 1.04998052e-03, 1.91798881e-05, 2.06218868e-01, 7.42717385e-01, 1.48074005e-05, -1.40855671e-03, 7.45946690e-02,...
Hello, every experts: I use cyberRT to connect lgsvl, and I found the "linear_accelearte" of correted_imu" is alway "0.0, 0.0, 0.0", I think it is abnormal I don't know how...
仔细阅读了bev_lanedet.py里面的代码,没有看到关于虚拟相机的概念 这个虚拟相机的概念,是在模型外面进行投影之后再送进模型,还是将原始图像直接送进模型? 一直没搞清楚上面的问题,望知情大佬指点一二 除此以外,还有关于数据集的问题,doc目录下的README.md只是说“请下载Apollo数据集, 下载作者提供的annotion文件” 却没有相关数据集的下载地址,有点不知所措,希望大佬们指点一二
Hi, big god I meet a new trouble about "LDG",the assembler raise exception like this: ` File hack.main.sm_86.cuasm:795 : [B------:R-:W2:-:S04] /*0080*/ LDG.E.LTC128B.CONSTANT R4, desc[UR4][R2.64] ; Error when assembling instruction "[B------:R-:W2:-:S04]...
Hi, big god. I found the "L2Bank" in your microbenchmark. I don't know the principle and theory of your code. would you like to teach me? I found some reference...
mhi, big god my CuAssembler raise exception when I test the "TestData" my nvcc is 11.3, and my arch is sm_86 it throw following exception when I executed "make hack":...
I checked the whole code, and I didn't find the IR or operator about convolution in source code and tutorials. I don't konw how to develop high performance convolution kernel...
I have a trouble about sensors' configuration I modified the vehicles.json(in ./mongo/setup), and remove old docker images, rebuild SORA(executed "docker compose up --build -d") and restart computer, but, at last,...