Quantcast
Channel: Why don't modern compilers coalesce neighboring memory accesses? - Stack Overflow
Viewing all articles
Browse latest Browse all 4

Why don't modern compilers coalesce neighboring memory accesses?

$
0
0

Consider the following code:

bool AllZeroes(const char buf[4]){    return buf[0] == 0 &&           buf[1] == 0 &&           buf[2] == 0 &&           buf[3] == 0;}

Output assembly from Clang 13 with -O3:

AllZeroes(char const*):                        # @AllZeroes(char const*)        cmp     byte ptr [rdi], 0        je      .LBB0_2        xor     eax, eax        ret.LBB0_2:        cmp     byte ptr [rdi + 1], 0        je      .LBB0_4        xor     eax, eax        ret.LBB0_4:        cmp     byte ptr [rdi + 2], 0        je      .LBB0_6        xor     eax, eax        ret.LBB0_6:        cmp     byte ptr [rdi + 3], 0        sete    al        ret

Each byte is compared individually, but it could've been optimized into a single 32-bit int comparison:

bool AllZeroes(const char buf[4]){    return *(int*)buf == 0;}

Resulting in:

AllZeroes2(char const*):                      # @AllZeroes2(char const*)        cmp     dword ptr [rdi], 0        sete    al        ret

I've also checked GCC and MSVC, and neither of them does this optimization. Is this disallowed by the C++ specification?

Edit:Changing the short-circuited AND (&&) to bitwise AND (&) will generate the optimized code. Also, changing the order the bytes are compared doesn't affect the code gen: https://godbolt.org/z/Y7TcG93sP


Viewing all articles
Browse latest Browse all 4

Latest Images

Trending Articles





Latest Images

<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>
<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596344.js" async> </script>