[C/C++] Surprises and Undefined Behavior From Unsigned Integer Promotion

Consider this code:

#include <limits&gt;
#include <iostream&gt;

int main()
{
   // assume this static assert passes
   static_assert(sizeof(unsigned short) < sizeof(int));
   unsigned short one = 1;
   unsigned short max = std::numeric_limits<unsigned short&gt;::max();

   unsigned short sum = one + max;
   if (sum == one + max)
      std::cout << "sum = one + max, and sum == one + max\n";
   else
      std::cout << "sum = one + max, but sum != one + max\n";
   return 0;
}

Figure 1

When you run it you’ll get the output

sum = one + max,  but sum != one + max

Here’s a link to it on wandbox if you want to try it out. For clarity, there’s no undefined behavior in the program and the compiler isn’t doing anything wrong.

Surprising?

C and C++ perform “integral promotion” when they encounter an operator (in the case of Figure 1, addition) that has at least one operand of integral type with lesser rank than type int. According to the integer promotion rules, that operand will be promoted to type int if all values representable by that operand’s type are representable by type int – otherwise the operand will be promoted to type unsigned int. In practice, the integral promotion rules specify that any integer type smaller (in bit-width) than type int will be implicitly converted by the compiler to type int.

In Figure 1, if the static_assert passes, the assignment

   unsigned short sum = one + max;

will be translated by the compiler into

   unsigned short sum = (unsigned short)((int)one + (int)max);

Commonly in today’s compilers, unsigned short is a 16 bit type, and int is a 32 bit type. For the sake of a concrete example, let’s assume our compiler has this common type specification. The variable max (of unsigned short type) in Figure 1 will be assigned the value 65535, and it will retain this value when converted to type int. The variable one obviously contains the value 1, and will retain that value after being converted to type int. The addition of these two (converted/promoted) type int values will result in the value 65536, which is easily representable in a 32 bit int type, so there is no overflow or undefined behavior from this addition. The compiler will cast this result from type int to type unsigned short in order to assign the result to variable sum. The value 65536 isn’t representable in a 16 bit unsigned short (sum‘s type), but the conversion is well-defined in C and C++; the conversion is performed modulo 2^N, where N is the bit width of type unsigned short. In this example, N=16 and thus the conversion of 65536 will result in the value 0, which will be assigned to sum.

A similar process takes place for the line

if (sum == one + max)

except that there is never any final narrowing conversion back to unsigned short. In more detail, the equality operator above has the variable sum for its left hand side operand, so the compiler must promote sum (which has type unsigned short) to type int according to the integral promotion rules. Likewise, one and max must be promoted to type int, since they are operands for the addition operator. Their summation is the type int value 65536. When evaluating the conditional, 65536 compares as unequal to the promoted value of sum (which got assigned the value 0 earlier), and so the program reports that “sum = one + max, but sum != one + max". When sum was assigned, a narrowing conversion took place, but the right hand side of the conditional never performs any narrowing conversion at all.

Hidden integral promotions and narrowing conversions are subtle, and the end results can be surprising. Generally speaking, the promotion of *signed* integral types does not cause any surprises or problems at all. It is the promotion of *unsigned* integral types that is problematic.

Among the problems is that it can create non-portable code:

#include <limits&gt;
#include <iostream&gt;

int main()
{
   unsigned short one = 1;
   unsigned short max = std::numeric_limits<unsigned short&gt;::max();
   unsigned int sum = one + max;
   std::cout << "sum == " << sum << "\n";
   return 0;
}

Figure 2

If you run Figure 2 on a system where unsigned short and int are both 16bit types, the program will output “sum == 0”. Since unsigned short and int are the same size, the operands one and max will not be promoted, and the addition will overflow in a well defined manner resulting in 0. If on the other hand you run Figure 2 on a system where unsigned short is a 16bit type and int is a 32 bit type, the operands one and max will be promoted to type int prior to the addition and no overflow will occur; the program will output “sum == 65536”

Undefined Behavior Due to Unsigned Integral Promotion

Now that we’re slightly familiar with integral promotion, let’s look at a small function:

unsigned short multiply(unsigned short x, unsigned short y)
{
   // assume this static assert passes
   static_assert(sizeof(unsigned short) * 2 == sizeof(int));

   unsigned short result = x * y;
   return result;
}

Figure 3

Despite all lines seeming to involve only type unsigned short, there is a potential for undefined behavior in Figure 3 on line 6 due to possible signed integer overflow on type int. The compiler will implicitly perform integral promotion on line 6, so that the multiplication will involve two (promoted/converted) operands of type int, not type unsigned short. If for our compiler unsigned short is 16 bit and int is 32 bit, then any product of x and y larger than 2^31 will overflow the signed type int. And unfortunately, signed integral overflow is undefined behavior. It doesn’t matter that overflow of unsigned integral types is well-defined behavior in C and C++. No multiplication of values of type unsigned short ever occurs in this function.

Let’s consider a longer toy example with the exact same problem:

#include <limits&gt;

unsigned short toy_multiply(unsigned short x)
{
   // assume this static assert passes
   static_assert(sizeof(unsigned short) == 2 &amp;&amp; sizeof(int) == 4);
   unsigned short max = std::numeric_limits<unsigned short&gt;::max();

   unsigned short result = max * x;
   if (x < max - 10)
      return 0;
   return result;
}

Figure 4

This function too has the potential for undefined behavior, for the same reason as Figure 3. It has an extra twist though. A perverse consequence of the potential for undefined behavior in Figure 4 is that any compiler would be within its rights to generate “optimized” object code for the function (if the static_assert succeeds) that is very fast and almost certainly unintended by the programmer, equivalent to

unsigned short toy_multiply(unsigned short x)
{
   return 0;
}

To see why, consider from Figure 4 the lines

unsigned short result = max * x;
if (x < max - 10)
   return 0;

For all values of x that fail this conditional, the multiplication product of max * x would have certainly overflowed. max and x are both promoted to type int prior to multiplying, and any value of x that fails the conditional must be very close to max. Taking into account the static assert, the maximum possible 16 bit unsigned value times a value very close to max will easily overflow a 32 bit signed int. And signed integral overflow is undefined behavior.

For better or worse, modern C/C++ compilers commonly use undefined behavior to optimize, by taking advantage of the fact that undefined behavior is impossible in any valid code. It’s somewhat controversial whether compilers really ought to ever do this, but the reality is that in the present day it is a very common optimization technique, and nothing in the C/C++ standards forbids it. With regard to Figure 4, this means a compiler might assume the conditional in toy_multiply() will always succeed – since the alternative would be undefined behavior from the overflow, which is impossible for valid code. Furthermore, the compiler could assume that any code that calls toy_multiply() will never call the function with an argument that would result in undefined behavior, because if it did the calling code would be invalid. If the conditional always succeeds, the compiler can drastically simplify the function. The result of the simplification is a function that does nothing but return 0. [If it’s any reassurance, I haven’t found a compiler that currently performs this optimization on Figure 4.]

Let’s look at one last, rather contrived, toy function:

unsigned short toy_shift(unsigned short x, unsigned short y) 
{
   // assume this static assert passes
   static_assert(sizeof(unsigned short) < sizeof(int));

   unsigned short result = (x-y) << 1;
   if (x &gt;= y)
      return 0;
   return result;
}

Figure 5

The subtraction operator in Figure 5 has two unsigned short operands x and y, both of which will be promoted to type int. If x is less than y then the result of the subtraction will be a negative number, and left shifting a negative number is undefined behavior.  Keep in mind that if the subtraction had involved unsigned integral types (as it would appear on the surface), the result would have underflowed in a well-defined manner and wrapped around to become a large positive number, and the left shift would have been well-defined. But since integral promotion occurs, the result is a negative number and the left shift is undefined behavior.  For similar reasons as given for Figure 4, the compiler could potentially “optimize” the code of Figure 5 so that it does nothing but return 0.

The Integral Types Which May be Promoted

Integral promotion involves some implementation-defined behavior.  It’s up to the compiler to define the exact sizes for the types char, unsigned char, signed char, short, unsigned shortint, unsigned int, long, unsigned long, long long, and unsigned long long.  The only way to know if one of these types has a larger bit-width than another is to check your compiler’s documentation, or to compile/run a program that outputs the sizeof() result for the types.  Thus it’s implementation defined whether int has a larger bit width than unsigned short, and by extension it’s implementation defined whether unsigned short will be promoted to type int.  The standard does effectively guarantee that types int, unsigned int, long, unsigned long, long long, and unsigned long long will never be promoted.  Floating point types of course are never subjected to integral promotion.

But this leaves far more integral types than you might expect which may (potentially) be promoted. A non-exhaustive list of types that might be promoted is

char, unsigned char, signed char, short, unsigned short, int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t, int64_t, uint64_t, int_fast8_t, uint_fast8_t, int_least8_t, uint_least8_t, int_fast16_t, uint_fast16_t, int_least16_t, uint_least16_t, int_fast32_t, uint_fast32_t, int_least32_t, uint_least32_t, int_fast64_t, uint_fast64_t, int_least64_t, uint_least64_t

Surprisingly, all the sized integral types (int32_t, uint64_t, etc) are open to possible integral promotion, dependent upon the implementation-defined size of int.  For example, it’s not unreasonable to think that today there may already be a compiler for special purpose hardware that defines int as a 64 bit type, and if so, int32_t and uint32_t will be subject to promotion to that larger int type.  In theory there’s nothing in the standard that would prevent a future compiler from defining int as even a 128 bit type, and so we have to include int64_t and uint64_t in the list of types that could perhaps be promoted, all dependent on how the compiler defines type int.

Very realistically in code today, unsigned char, unsigned short, uint8_t and uint16_t (and also uint_least8_t, uint_least16_t, uint_fast8_t, uint_fast16_t) should be considered a minefield for programmers and maintainers.  On most compilers (defining int as at least 32 bit), these types don’t behave as expected.  They will usually be promoted to type int during operations and comparisons, and so they will be vulnerable to all the undefined behavior of the signed type int. They will not be protected by any well-defined behavior of the original unsigned type, since after promotion the types are no longer unsigned.

Reference

The C++17 standard has multiple sections that involve integral promotion. For reference, here are the excerpts/summaries from the relevant parts of the C++17 standard draft:

7.6 Integral promotions [conv.prom]
1 A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer conversion rank (7.15) is less than the rank of int can be converted to a prvalue of type int if int can represent all the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned int.

8 Expressions [expr]
11 Many binary operators that expect operands of arithmetic or enumeration type cause conversions […These are] called the usual arithmetic conversions.
[… If neither operand has scoped enumeration type, type long double, double, or float,] the integral promotions (7.6) shall be performed on both operands.

8.3.1 Unary operators [expr.unary.op] (parts 7, 8, 10)
[For the unary operators +, -, ~, the operands are subject to integral promotion.]

8.6 Multiplicative operators [expr.mul]
[Binary operators *, /, %]
2 The usual arithmetic conversions are performed on the operands and determine the type of the result.

8.7 Additive operators [expr.add]
1 The additive [binary] operators + and – group left-to-right. The usual arithmetic conversions are performed for operands of arithmetic or enumeration type.

8.8 Shift operators [expr.shift]
[For the binary operators << and >>, the operands are subject to integral promotion.]

8.9 Relational operators [expr.rel]
[<, <=, >, >=]
2 The usual arithmetic conversions are performed on operands of arithmetic or enumeration type

8.10 Equality operators [expr.eq]
[==, !=]
6 If both operands are of arithmetic or enumeration type, the usual arithmetic conversions are performed on both operands

8.11 Bitwise AND operator [expr.bit.and]
1 The usual arithmetic conversions are performed;

8.12 Bitwise exclusive OR operator [expr.xor]
1 The usual arithmetic conversions are performed;

8.13 Bitwise inclusive OR operator [expr.or]
1 The usual arithmetic conversions are performed;

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s