Raphael S.Carvalho's Programming Blog

raphael.scarv@gmail.com

"A programmer that cannot debug effectively is blind."

Saturday, August 31, 2013

Endianess

Look carefully at the following snippet of code:

int c = 0xFFAABBCC;
printf("%02x\n", ((char *)&c)[0]);

If you aren't familiar, then you may be asking yourself: How does it work?

Each hexadecimal digit represents a nibble, that is, 4 bits. Then 2 hexadecimal digits = 1 byte. 'int c' stores a 32-bit/4-bytes value.
It's also important mentioning that '0x' is prepended to all hexadecimal values in the C language.

Computer memory is basically a bunch of sequenced 8-bits cells{1}, then it's not possible to store all the bytes from that value into a single cell. Why? integer stores a multi-byte value, and so must span several memory cells.
Unfortunately, there are some architectures that store numbers in different ways.

* {1}: This may not be true in the real world! Google about NUMA systems.

x86 is a little-endian architecture, which means that less-significant bytes are stored first.
Do you understand the meaning of most-significant byte and less-significant byte at the following hexadecimal value: 0xFFAABBCC?
It's just a terminology to describe significance respective to each byte of a multi-byte value.
0xFF is the most-significant, whereas 0xCC the less-significant one.

So answer me the following, which byte from the variable c will be stored first in memory? the most-significant or the less-significant?
If you understood the content above, you know that it depends on the underlying arch.

On a little-endian arch, the bytes from 0xFFAABBCC will be stored in memory as follow (On a big-endian arch, 0xFF (the most-significant byte) would be stored first instead):

[0] = 0xCC
[1] = 0xBB
[2] = 0xAA
[3] = 0xFF


* [0] meaning that it's a lower address than [3].

* An example of big-endian arch is PPC.

- So let's get started figuring out what each piece of the code means:

This is the first step of the code: ((char *)&c)
It basically gets the address from an integer variable, then we have a pointer to an integer. Thereafter, it converts the integer pointer into a character one by using an explicit cast.
It means that it's a pointer to a 8-bits value from now on.

((char *)&c)[0]: The second step will basically gets the value pointed to by the character pointer.
As I told you, less-significant bytes are stored first on little-endian archs, then 0xCC is the output. If it were [1] instead, then 0xBB would be output since [1] references the second less-significant byte.
You can see how the bytes were individually stored in memory by looking at my description above.

If you got the content from this post, then you won't have any troubles in creating a code to check the endianess of your machine. If you're feeling adventurous, take it as an exercise =)

Hope you liked it,
Raphael S. Carvalho.

No comments:

Post a Comment