snappy-c icon indicating copy to clipboard operation
snappy-c copied to clipboard

performance improvement for CPUs with slow unaligned 64-bit copy

Open nbkolchin opened this issue 12 years ago • 2 comments

On ARM (and probably other platforms) unaligned 64-bit access is really slow. Instead of using 'STORE64(p, LOAD64(src))' macro, it is better to have special function for that.

Logic was copied from original snappy implementation.

Test results for ARM Nova A9500:

Old version:

        snappy-c, 0,     786507,     3.2551,     1.2583
        snappy-c, 1,      36915,     4.6512,    13.9788
        snappy-c, 2,      56898,    15.4230,   113.3201
        snappy-c, 3,      62621,    15.8439,   111.8671
        snappy-c, 4,      54598,    12.5148,   104.7013

unaligned_copy64:

        snappy-c, 0,     786507,     3.1400,     1.2359
        snappy-c, 1,      36915,     2.9551,     2.6206
        snappy-c, 2,      56898,     4.1648,     4.1232
        snappy-c, 3,      62621,     4.3992,     4.1195
        snappy-c, 4,      54598,     4.2060,     4.0001

As you see, almost 30x speed improvement in decompression.

P.S. "Original" snappy implementation use similar approach. But they check pointer size to make decision how data will be copied. This hammers performance on Intel 32-bit platforms, where unaligned 64-bit data transfer is pretty fast.

nbkolchin avatar Jun 14 '12 21:06 nbkolchin

On Thu, Jun 14, 2012 at 02:40:05PM -0700, nbkolchin wrote:

On ARM (and probably other platforms) unaligned 64-bit access is really slow. Instead of using 'STORE64(p, LOAD64(src))' macro, it is better to have special function for that.

Thanks for the patchdata. But I don't like the implementation. Could you hide this logic in get_unaligned() ? My code was supposed to be used in the Linux kernel and it already has a architecture aware get_unaligned. I just didn't implement that in the compat layer for the user space version.

Something like

#ifdef arm #define get_unaligned() unaligned version #else #define get_unaligned() #endif

and probably the same for put_unaligned.

Thanks.

-Andi

andikleen avatar Jun 15 '12 11:06 andikleen

With Linux unaligned functions performance will decrease:

        snappy-c, 0,     786507,     3.8460,     1.2267
        snappy-c, 1,      36915,     5.5314,     5.5649
        snappy-c, 2,      56898,     6.9741,     6.8187
        snappy-c, 3,      62621,     7.3982,     6.8128
        snappy-c, 4,      54598,     6.9541,     6.4243

Linux interface is OK for drivers, but bad for archiver. Modern ARM support unaligned access for data in memory, but not for device-mapped areas, etc.

GCC, since 4.7, have special define to detect if unaligned access is allowed without penalties: __ARM_FEATURE_UNALIGNED.

64-bit unaligned access is not permitted on any current ARM model.

nbkolchin avatar Jun 19 '12 22:06 nbkolchin