Descent3
Descent3 copied to clipboard
Use compiler intrinsic byteswapping functions
I'm not trying to be stubborn but the current C++ byteswap code is a lot slower than using compiler intrinsic functions.
Converts INTEL_* and MOTOROLA_* macros to use the proper compiler intrinsic functions.
List of macro aliases: From POSIX for networking conversions:
- host to network (16-bit) : htons()
- host to network (32-bit) : htonl()
- network to host (16-bit) : ntohs()
- network to host (32-bit) : ntohl()
From GNU for endian conversions:
- host to little endian (16-bit) : htole16()
- host to little endian (32-bit) : htole32()
- host to little endian (64-bit) : htole64()
- little endian to host (16-bit) : le16toh()
- little endian to host (32-bit) : le32toh()
- little endian to host (64-bit) : le64toh()
Comparison to existing code
You can see the assembly output here
C++ byteswap implementation
unsigned short D3::byteswap<unsigned short>(unsigned short):
push rbp
mov rbp, rsp
mov eax, edi
mov WORD PTR [rbp-20], ax
mov QWORD PTR [rbp-8], 0
jmp .L12
.L13:
mov eax, 1
sub rax, QWORD PTR [rbp-8]
lea rdx, [rbp-20]
add rax, rdx
lea rcx, [rbp-10]
mov rdx, QWORD PTR [rbp-8]
add rdx, rcx
movzx eax, BYTE PTR [rax]
mov BYTE PTR [rdx], al
add QWORD PTR [rbp-8], 1
.L12:
cmp QWORD PTR [rbp-8], 1
jbe .L13
movzx eax, WORD PTR [rbp-10]
pop rbp
ret
compiler intrinsic byteswap implementation
__bswap_16:
push rbp
mov rbp, rsp
mov eax, edi
mov WORD PTR [rbp-4], ax
movzx eax, WORD PTR [rbp-4]
rol ax, 8
pop rbp
ret