blueWatermelonFri
blueWatermelonFri
I encountered a strange bug while programming tensor core using the **WMMA** api in A800. I tried to print the size of the element in the fragment,Normally **sizeof**(fp16) is 2,...
Hey, I was runing l2 cache test in my A800 80GB GPU, and i tried to modify the parameters `N`, there are some strange results. In default, `N`=64, and result...
I tried to test bandwidth with cuda-stream benchmark, my device is 4060TI, bandwidth is 288GB/s. I changed param `max_buffer_size ` from `128l * 1024 *1024 +2` to `1024 * 1024`as...
通过本地的ubuntu服务器对yolov5s模型进行连板调试时,fp16的精度下降了4个点,请问这个现象正常吗? rknntoolkit2的版本如下: ```rknn-toolkit2 version: 2.0.0b0+9bab5682``` rk3588的驱动版本如下: ``` D RKNNAPI: API: 2.0.0b0 (18eacd0 build@2024-03-22T06:07:59) D RKNNAPI: DRV: rknn_server: 2.0.0b0 (18eacd0 build@2024-03-22T14:07:19) D RKNNAPI: DRV: rknnrt: 2.0.0b0 (35a6907d79@2024-03-24T10:31:14) ``` rknn.config的参数如下: ``` rknn.config(mean_values=[[0,...
Hey, i find in gpu-cache test the blocksize is `256`, why it is not `1024` ? When i changed blocksize from `256` to `1024`, L1 cache bandwidth tested has some...
Hi , I was reading source code about autoGPTQ, but I feel confused in fasterquant(). what will happen if zero in diag of hession matrix? ```python dead = torch.diag(H) ==...
## 问题描述 现在的模型,以doubao-seed-1-6-250615为例,默认是开启**thinking模式**的,这导致翻译时间过长,以及消耗token数目过多的问题。因为thinking过程的token也是要计费的。 我同一句话的翻译,使用非**thinking模式**token数不到100,**thinking模式**需要300左右。但是翻译这种简单的任务,不需要进入thinking模式。 所以导致我只能使用doubao-1-5-pro-32k-250115,这个是没有**thinking模式**的模型。 ## 问题如何解决 可以提供一个是否打开**thinking模式**的开关,或者直接默认大家都不开启**thinking模式**
Hello, First, I've been studying the source code to better understand the implementation of the sieving algorithms, and I have a quick question about a specific design choice. I noticed...