less_slow.cpp
                                
                                
                                
                                    less_slow.cpp copied to clipboard
                            
                            
                            
                        Data Alignment may have error?
The loop in f32_pairwise_accumulation have f32s_in_cache_line_half_k * 2 times, and the other one only have f32s_in_cache_line_half_k times.