BEVFormer_tensorrt
BEVFormer_tensorrt copied to clipboard
INT8 accuray issue for Multiscale deformable attention
hello, I'm curious about the accuray of FUNCTION:ms_deformable_im2col_cuda_int8() channels /= 4; const int value_step = num_heads * spatial_size * channels; const int output_step = num_heads * num_query * channels; const int points_step = num_query * points_per_group; const int weight_step = num_heads * num_query * num_levels * num_point; const int offset_step = weight_step * 2;
for (int batch_index = 0; batch_index < batch_size; batch_index++) { ms_deformable_im2col_gpu_kernel_int8<__half2> <<<GET_BLOCKS(num_kernels), THREADS_PER_BLOCK, 0, stream>>>( num_kernels, data_value, scale_value, data_spatial_shapes, data_reference_points, data_sampling_offsets, scale_offset, data_attn_weight, scale_weight, 1, spatial_size, num_heads, channels, num_levels, num_query, num_point, points_per_group, data_col, scale_out); data_value += value_step;// data_col += output_step; data_reference_points += points_step; data_sampling_offsets += offset_step; data_attn_weight += weight_step; } For data_value += value_step; const int output_step = num_heads * num_query * channels; channels have already been divisible by 4,this is not next batch data for data_value , right?