[26] Built-in / intrinsic / primitive data types
(Part of C++ FAQ Lite, Copyright © 1991-2006, Marshall Cline, cline@parashift.com)


FAQs in section [26]:


[26.1] Can sizeof(char) be 2 on some machines? For example, what about double-byte characters?

No, sizeof(char) is always 1. Always. It is never 2. Never, never, never.

Even if you think of a "character" as a multi-byte thingy, char is not. sizeof(char) is always exactly 1. No exceptions, ever.

Look, I know this is going to hurt your head, so please, please just read the next few FAQs in sequence and hopefully the pain will go away by sometime next week.

TopBottomPrevious sectionNext sectionSearch the FAQ ]


[26.2] What are the units of sizeof?

Bytes.

For example, if sizeof(Fred) is 8, the distance between two Fred objects in an array of Freds will be exactly 8 bytes.

As another example, this means sizeof(char) is one byte. That's right: one byte. One, one, one, exactly one byte, always one byte. Never two bytes. No exceptions.

TopBottomPrevious sectionNext sectionSearch the FAQ ]


[26.3] Whoa, but what about machines or compilers that support multibyte characters. Are you saying that a "character" and a char might be different?!?

Yes that's right: the thing commonly referred to as a "character" might be different from the thing C++ calls a char.

I'm really sorry if that hurts, but believe me, it's better to get all the pain over with at once. Take a deep breath and repeat after me: "character and char might be different." There, doesn't that feel better? No? Well keep reading — it gets worse.

TopBottomPrevious sectionNext sectionSearch the FAQ ]


[26.4] But, but, but what about machines where a char has more than 8 bits? Surely you're not saying a C++ byte might have more than 8 bits, are you?!?

Yep, that's right: a C++ byte might have more than 8 bits.

The C++ language guarantees a byte must always have at least 8 bits. But there are implementations of C++ that have more than 8 bits per byte.

TopBottomPrevious sectionNext sectionSearch the FAQ ]


[26.5] Okay, I could imagine a machine with 9-bit bytes. But surely not 16-bit bytes or 32-bit bytes, right?

Wrong.

I have heard of one implementation of C++ that has 64-bit "bytes." You read that right: a byte on that implementation has 64 bits. 64 bits per byte. 64. As in 8 times 8.

And yes, you're right, combining with the above would mean that a char on that implementation would have 64 bits.

TopBottomPrevious sectionNext sectionSearch the FAQ ]


[26.6] I'm sooooo confused. Would you please go over the rules about bytes, chars, and characters one more time?

Here are the rules:

Let's work an example to illustrate these rules. The PDP-10 has 36-bit words with no hardware facility to address anything within one of those words. That means a pointer can point only at things on a 36-bit boundary: it is not possible for a pointer to point 8 bits to the right of where some other pointer points.

One way to abide by all the above rules is for a PDP-10 C++ compiler to define a "byte" as 36 bits. Another valid approach would be to define a "byte" as 9 bits, and simulate a char* by two words of memory: the first could point to the 36-bit word, the second could be a bit-offset within that word. In that case, the C++ compiler would need to add extra instructions when compiling code using char* pointers. For example, the code generated for *p = 'x' might read the word into a register, then use bit-masks and bit-shifts to change the appropriate 9-bit byte within that word. An int* could still be implemented as a single hardware pointer, since C++ allows sizeof(char*) != sizeof(int*).

Using the same logic, it would also be possible to define a PDP-10 C++ "byte" as 12-bits or 18-bits. However the above technique wouldn't allow us to define a PDP-10 C++ "byte" as 8-bits, since 8*4 is 32, meaning every 4th byte we would skip 4 bits. A more complicated approach could be used for those 4 bits, e.g., by packing nine bytes (of 8-bits each) into two adjacent 36-bit words. The important point here is that memcpy() has to be able to see every bit of memory: there can't be any bits between two adjacent bytes.

Note: one of the popular non-C/C++ approaches on the PDP-10 was to pack 5 bytes (of 7-bits each) into each 36-bit word. However this won't work in C or C++ since 5*7 = 35, meaning using char*s to walk through memory would "skip" a bit every fifth byte (and also because C++ requires bytes to have at least 8 bits).

TopBottomPrevious sectionNext sectionSearch the FAQ ]


[26.7] What is a "POD type"?

A type that consists of nothing but Plain Old Data.

A POD type is a C++ type that has an equivalent in C, and that uses the same rules as C uses for initialization, copying, layout, and addressing.

As an example, the C declaration struct Fred x; does not initialize the members of the Fred variable x. To make this same behavior happen in C++, Fred would need to not have any constructors. Similarly to make the C++ version of copying the same as the C version, the C++ Fred must not have overloaded the assignment operator. To make sure the other rules match, the C++ version must not have virtual functions, base classes, non-static members that are private or protected, or a destructor. It can, however, have static data members, static member functions, and non-static non-virtual member functions.

The actual definition of a POD type is recursive and gets a little gnarly. Here's a slightly simplified definition of POD: a POD type's non-static data members must be public and can be of any of these types: bool, any numeric type including the various char variants, any enumeration type, any data-pointer type (that is, any type convertible to void*), any pointer-to-function type, or any POD type, including arrays of any of these. Note: data-pointers and pointers-to-function are okay, but pointers-to-member are not. Also note that references are not allowed. In addition, a POD type can't have constructors, virtual functions, base classes, or an overloaded assignment operator.

TopBottomPrevious sectionNext sectionSearch the FAQ ]


[26.8] When initializing non-static data members of built-in / intrinsic / primitive types, should I use the "initialization list" or assignment?

For symmetry, it is usually best to initialize all non-static data members in the constructor's "initialization list," even those that are of a built-in / intrinsic / primitive type. The FAQ shows you why and how.

TopBottomPrevious sectionNext sectionSearch the FAQ ]


[26.9] When initializing static data members of built-in / intrinsic / primitive types, should I worry about the "static initialization order fiasco"?

Yes, if you initialize your built-in / intrinsic / primitive variable by an expression that the compiler doesn't evaluate solely at compile-time. The FAQ provides several solutions for this (subtle!) problem.

TopBottomPrevious sectionNext sectionSearch the FAQ ]


[26.10] Can I define an operator overload that works with built-in / intrinsic / primitive types?

No, the C++ language requires that your operator overloads take at least one operand of a "class type" or enumeration type. The C++ language will not let you define an operator all of whose operands / parameters are of primitive types.

For example, you can't define an operator== that takes two char*s and uses string comparison. That's good news because if s1 and s2 are of type char*, the expression s1 == s2 already has a well defined meaning: it compares the two pointers, not the two strings pointed to by those pointers. You shouldn't use pointers anyway. Use std::string instead of char*.

If C++ let you redefine the meaning of operators on built-in types, you wouldn't ever know what 1 + 1 is: it would depend on which headers got included and whether one of those headers redefined addition to mean, for example, subtraction.

TopBottomPrevious sectionNext sectionSearch the FAQ ]


[26.11] When I delete an array of some built-in / intrinsic / primitive type, why can't I just say delete a instead of delete[] a?

Because you can't.

Look, please don't write me an email asking me why C++ is what it is. It just is. If you really want a rationale, buy Bjarne Stroustrup's excellent book, "Design and Evolution of C++" (Addison-Wesley publishers). But if your real goal is to write some code, don't waste too much time figuring out why C++ has these rules, and instead just abide by its rules.

So here's the rule: if a points to an array of thingies that was allocated via new T[n], then you must, must, must delete it via delete[] a. Even if the elements in the array are built-in types. Even if they're of type char or int or void*. Even if you don't understand why.

TopBottomPrevious sectionNext sectionSearch the FAQ ]


[26.12] How can I tell if an integer is a power of two without looping?

 inline bool isPowerOf2(int i)
 {
   return i > 0 && (i & (i - 1)) == 0;
 }

TopBottomPrevious sectionNext sectionSearch the FAQ ]


E-Mail E-mail the author
C++ FAQ LiteTable of contentsSubject indexAbout the author©Download your own copy ]
Revised Mar 1, 2006