gdrcopy icon indicating copy to clipboard operation
gdrcopy copied to clipboard

support AVX2 x86 instructions

Open drossetti opened this issue 7 years ago • 3 comments

drossetti avatar Dec 06 '17 22:12 drossetti

It doesn't result in a speedup, as memcpy is memory-bound.

Test                     Size(B)         Avg.Time(us)
gdr_copy_to_mapping             1             0.2038
gdr_copy_to_mapping             2             0.1940
gdr_copy_to_mapping             4             0.1860
gdr_copy_to_mapping             8             0.1865
DBG:  using AVX2 implementation of gdr_copy_to_bar
gdr_copy_to_mapping            16             0.1960
gdr_copy_to_mapping            32             0.1928
gdr_copy_to_mapping            64             0.1901
gdr_copy_to_mapping           128             0.1869
gdr_copy_to_mapping           256             0.1926
gdr_copy_to_mapping           512             0.2109
gdr_copy_to_mapping          1024             0.2547
gdr_copy_to_mapping          2048             0.3260
gdr_copy_to_mapping          4096             0.4883
gdr_copy_to_mapping          8192             0.8617
gdr_copy_to_mapping         16384             1.6531
gdr_copy_to_mapping         32768             3.2493
gdr_copy_to_mapping         65536             6.4663
gdr_copy_to_mapping        131072            12.8850
gdr_copy_to_mapping        262144            25.7638
gdr_copy_to_mapping        524288            51.4691
gdr_copy_to_mapping       1048576           102.8449
gdr_copy_to_mapping       2097152           206.1706
gdr_copy_to_mapping       4194304           413.7580
gdr_copy_to_mapping       8388608           828.0581
gdr_copy_to_mapping      16777216          1676.4880

imaginary-person avatar Jul 10 '21 06:07 imaginary-person

BTW, for testing AVX2 or AVX512 support, __cpuid_count should be used instead of __get_cpuid. has_avx2 seems to be incorrectly computed in the source-code, and is 0 even when it should be 1.

imaginary-person avatar Jul 10 '21 06:07 imaginary-person

It doesn't result in a speedup, as memcpy is memory-bound.

Test                     Size(B)         Avg.Time(us)
gdr_copy_to_mapping             1             0.2038
gdr_copy_to_mapping             2             0.1940
gdr_copy_to_mapping             4             0.1860
gdr_copy_to_mapping             8             0.1865
DBG:  using AVX2 implementation of gdr_copy_to_bar
gdr_copy_to_mapping            16             0.1960
gdr_copy_to_mapping            32             0.1928
gdr_copy_to_mapping            64             0.1901
gdr_copy_to_mapping           128             0.1869
gdr_copy_to_mapping           256             0.1926
gdr_copy_to_mapping           512             0.2109
gdr_copy_to_mapping          1024             0.2547
gdr_copy_to_mapping          2048             0.3260
gdr_copy_to_mapping          4096             0.4883
gdr_copy_to_mapping          8192             0.8617
gdr_copy_to_mapping         16384             1.6531
gdr_copy_to_mapping         32768             3.2493
gdr_copy_to_mapping         65536             6.4663
gdr_copy_to_mapping        131072            12.8850
gdr_copy_to_mapping        262144            25.7638
gdr_copy_to_mapping        524288            51.4691
gdr_copy_to_mapping       1048576           102.8449
gdr_copy_to_mapping       2097152           206.1706
gdr_copy_to_mapping       4194304           413.7580
gdr_copy_to_mapping       8388608           828.0581
gdr_copy_to_mapping      16777216          1676.4880

May I ask what do you do for AVX2 optimization compared to AVX?

protoss1235 avatar May 16 '23 09:05 protoss1235