Consider the following code:
bool AllZeroes(const char buf[4]){ return buf[0] == 0 && buf[1] == 0 && buf[2] == 0 && buf[3] == 0;}
Output assembly from Clang 13 with -O3
:
AllZeroes(char const*): # @AllZeroes(char const*) cmp byte ptr [rdi], 0 je .LBB0_2 xor eax, eax ret.LBB0_2: cmp byte ptr [rdi + 1], 0 je .LBB0_4 xor eax, eax ret.LBB0_4: cmp byte ptr [rdi + 2], 0 je .LBB0_6 xor eax, eax ret.LBB0_6: cmp byte ptr [rdi + 3], 0 sete al ret
Each byte is compared individually, but it could've been optimized into a single 32-bit int comparison:
bool AllZeroes(const char buf[4]){ return *(int*)buf == 0;}
Resulting in:
AllZeroes2(char const*): # @AllZeroes2(char const*) cmp dword ptr [rdi], 0 sete al ret
I've also checked GCC and MSVC, and neither of them does this optimization. Is this disallowed by the C++ specification?
Edit:Changing the short-circuited AND (&&
) to bitwise AND (&
) will generate the optimized code. Also, changing the order the bytes are compared doesn't affect the code gen: https://godbolt.org/z/Y7TcG93sP