[C/C++] Surprises and Undefined Behavior From Unsigned Integer Promotion

“There are far too many integer types, there are far too lenient rules for mixing them together, and it’s a major bug source, which is why I’m saying stay as simple as you can, use [signed] integers til you really really need something else.” -Bjarne Stroustrup, (Q&A at 43:00)

“Use [signed] ints unless you need something different, then still use something signed until you really need something different, then resort to unsigned.” -Herb Sutter, (same Q&A)

This is good and easy advice. Though if you’re curious, you probably also want to know why it’s good advice. And if you really need an unsigned type, you probably want to know if there’s anything you can or should do to avoid bugs; for solutions, just skip ahead to the Recommendations.

Surprises

You could run into a few problems with unsigned integer types. The following code might be quite surprising:

#include <limits>
#include <iostream>

int main()
{
   // assume this static assert passes
   static_assert(sizeof(unsigned short) < sizeof(int));
   unsigned short one = 1;
   unsigned short max = std::numeric_limits<unsigned short>::max();

   unsigned short sum = one + max;
   if (sum == one + max)
      std::cout << "sum = one + max, and sum == one + max\n";
   else
      std::cout << "sum = one + max, but sum != one + max\n";
   return 0;
}

Figure 1

If you run it you’ll get the output

sum = one + max,  but sum != one + max

Here’s a link to it on wandbox if you want to try it out. There’s no undefined behavior in the program or compiler bugs.

The surprising result occurs due to “integral promotion”. You can see the provided link for the precise integer promotion rules, but in practice, the rules mean that during a math operation or comparison, any integer types smaller (in bit-width) than type int will be implicitly converted by the compiler to type int.

This means that in Figure 1, if the static_assert passes, the assignment

   unsigned short sum = one + max;

will be translated by the compiler into

   unsigned short sum = (unsigned short)((int)one + (int)max);

Let’s work with concrete numbers and assume your compiler uses a 16 bit unsigned short type and a 32 bit int type – this is very common, though not universal. In Figure 1, the unsigned short variable max will be assigned the value 65535, and will retain this value when converted to type int. The variable one will be assigned the value 1, and will retain this value after being converted to type int. The addition of these two (converted/promoted) type int values results in the value 65536, which is easily representable in a 32 bit int type, and so there won’t be any overflow or undefined behavior from the addition. The compiler will cast that result from type int to type unsigned short in order to assign it to variable sum. The value 65536 isn’t representable in a 16 bit unsigned short (sum‘s type), but the conversion is well-defined in C and C++; the conversion gets performed modulo 2N, where N is the bit width of type unsigned short. In this example, N=16 and thus the conversion of 65536 will result in the value 0, which will be assigned to sum.

A similar process occurs on the line

if (sum == one + max)

except that there isn’t any final narrowing conversion back to unsigned short. Here’s what happens: As before, one and max are promoted to type int prior to addition, resulting in a type int summation value of 65536. When evaluating the conditional, the left hand side (sum) is promoted to type int, and the right hand side (the summation 65536) is already type int. 65536 compares as unequal to sum, since sum was assigned the value 0 earlier. A narrowing conversion to unsigned short took place when sum was assigned. However, the conditional operator works with operands of type int, and so the right hand side summation never gets a similar conversion down to unsigned short. It stays as type int. We end up with the unexpected output “sum = one + max, but sum != one + max"

Hidden integral promotions and narrowing conversions are subtle, and the results can be surprising, which is usually a very bad thing. For *signed* integral types, there generally isn’t any problem with promotion. It’s the promotion of *unsigned* integral types that’s problematic and bug-prone.

Let’s look at a second surprise from unsigned integer promotion:

#include <limits>
#include <iostream>

int main()
{
   unsigned short one = 1;
   unsigned short max = std::numeric_limits<unsigned short>::max();
   unsigned int sum = one + max;
   std::cout << "sum == " << sum << "\n";
   return 0;
}

Figure 2

If you run Figure 2 on a system where unsigned short and int are both 16bit types, the program will output “sum == 0”. Since unsigned short and int are the same size, the operands one and max will not be promoted, and the addition will overflow in a well defined manner resulting in 0. If on the other hand you run Figure 2 on a system where unsigned short is a 16bit type and int is a 32 bit type, the operands one and max will be promoted to type int prior to the addition and no overflow will occur; the program will output “sum == 65536”. The integral promotion results in non-portable code.

Undefined Behavior

Now that we’re familiar with integral promotion, let’s look at a simple function:

unsigned short multiply(unsigned short x, unsigned short y)
{
   // assume this static assert passes
   static_assert(sizeof(unsigned short) * 2 == sizeof(int));

   unsigned short result = x * y;
   return result;
}

Figure 3

Despite all lines seeming to involve only type unsigned short, there is a potential for undefined behavior in Figure 3 on line 6 due to possible signed integer overflow on type int. The compiler will implicitly perform integral promotion on line 6, so that the multiplication will involve two (promoted/converted) operands of type int, not of type unsigned short. If for our compiler unsigned short is 16 bit and int is 32 bit, then any product of x and y larger than 2^31 will overflow the signed type int. And unfortunately, signed integral overflow is undefined behavior. It doesn’t matter that overflow of unsigned integral types is well-defined behavior in C and C++. No multiplication of values of type unsigned short ever occurs in this function.

Let’s finally look at a contrived toy function:

unsigned short toy_shift(unsigned short x, unsigned short y) 
{
   // assume this static assert passes
   static_assert(sizeof(unsigned short) < sizeof(int));

   unsigned short result = (x-y) << 1;
   if (x >= y)
      return 0;
   return result;
}

Figure 4

The subtraction operator in Figure 4 has two unsigned short operands x and y, both of which will be promoted to type int. If x is less than y then the result of the subtraction will be a negative number, and left shifting a negative number is undefined behavior.  Keep in mind that if the subtraction had involved unsigned integral types (as it would appear on the surface), the result would have underflowed in a well-defined manner and wrapped around to become a large positive number, and the left shift would have been well-defined. But since integral promotion occurs, the result of a left shift when x is less than y would be undefined behavior.

An interesting consequence of the potential for undefined behavior in Figure 4 is that any compiler would be within its rights to generate “optimized” object code for the function (if the static_assert succeeds) that is very fast and almost certainly unintended by the programmer, equivalent to

unsigned short toy_shift(unsigned short x, unsigned short y)
{
   return 0;
}

To see why, we need to understand how modern compilers can use undefined behavior. For better or worse, modern C/C++ compilers commonly use undefined behavior to optimize, by taking advantage of the fact that undefined behavior is impossible in any valid code. It’s somewhat controversial whether compilers really ought to ever do this, but the reality is that in the present day it’s an extremely common optimization technique, and nothing in the C/C++ standards forbids it. With regard to Figure 4, this means a compiler could assume the conditional (x >= y) in toy_shift() will always succeed – because the alternative would be that the function had undefined behavior from left shifting a negative number, and the compiler knows that undefined behavior is impossible for valid code. The compiler always assumes that we have written valid code unless it can prove otherwise (in which case we’d get a compiler error message). We might incorrectly think that the compiler can’t make any assumptions about the arguments to the toy_shift() function because it can’t predict what arbitrary calling code might do, but the compiler can make some limited predictions. It can assume that calling code will never use any arguments that result in undefined behavior, because getting undefined behavior would be impossible from valid calling code. The compiler can therefore conclude that with valid code, there is no scenario in which the conditional could possibly fail, and it could use this knowledge to “optimize” the function, producing object code that simply returns 0. [For scant reassurance, I haven’t seen a compiler do this (yet) for Figure 4.]

The Integral Types Which May be Promoted

Integral promotion involves some implementation-defined behavior.  It’s up to the compiler to define the exact sizes for the types char, unsigned char, signed char, short, unsigned shortint, unsigned int, long, unsigned long, long long, and unsigned long long.  The only way to know if one of these types has a larger bit-width than another is to check your compiler’s documentation, or to compile/run a program that outputs the sizeof() result for the types.  Thus it’s implementation defined whether int has a larger bit width than unsigned short, and by extension it’s implementation defined whether unsigned short will be promoted to type int.  The standard does effectively guarantee that types int, unsigned int, long, unsigned long, long long, and unsigned long long will never be promoted.  Floating point types of course are never subjected to integral promotion.

But this leaves far more integral types than you might expect which may (at least in principle) be promoted. A non-exhaustive list of types that might be promoted is

char, unsigned char, signed char, short, unsigned short, int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t, int64_t, uint64_t, int_fast8_t, uint_fast8_t, int_least8_t, uint_least8_t, int_fast16_t, uint_fast16_t, int_least16_t, uint_least16_t, int_fast32_t, uint_fast32_t, int_least32_t, uint_least32_t, int_fast64_t, uint_fast64_t, int_least64_t, uint_least64_t

Surprisingly, all the sized integral types (int32_t, uint64_t, etc) are open to possible integral promotion, dependent upon the implementation-defined size of int.  For example, it’s plausible that there could someday be a compiler that defines int as a 64 bit type, and if so, int32_t and uint32_t will be subject to promotion to that larger int type.  Compiler writers are likely to understand this could break older code that implicitly assumes uint32_t won’t be promoted, but there’s no guarantee. In theory there’s nothing in the standard that would prevent a future compiler from defining int as even a 128 bit type, and so we have to include int64_t and uint64_t in the list of types that could at least in theory be promoted, all dependent on how the compiler defines type int.

Very realistically in code today, unsigned char, unsigned short, uint8_t and uint16_t (and also uint_least8_t, uint_least16_t, uint_fast8_t, uint_fast16_t) should be considered a minefield for programmers and maintainers.  On most compilers (defining int as at least 32 bit), these types don’t behave as expected.  They will usually be promoted to type int during operations and comparisons, and so they will be vulnerable to all the undefined behavior of the signed type int. They won’t be protected by any well-defined behavior of the original unsigned type, since after promotion the types are no longer unsigned.

Recommendations

Sometimes you really do need unsigned integers. Unsigned int, unsigned long, and unsigned long long are all more or less safe since they’re never promoted. But if you use an unsigned type from the last section, or if you use generic code that expects an unsigned integer type of unknown size, that type can be dangerous to use due to promotion.

For C, there’s a workable solution to the unsigned integer promotion problem. Whenever you use a variable with unsigned type smaller than unsigned int, add 0u to it within parentheses. For example, when multiplying two unsigned short variables a and b, you can write (a+0u)*(b+0u). Since 0u has type unsigned int, adding it to a or b will effectively manually “promote” the operand to unsigned int if it has type smaller than unsigned int. This is useful both for mathematical operations and comparisons to ensure that operands won’t get unexpectedly promoted to type int. A disadvantage is maintainers may not understand its meaning when seeing it. An alternative you can use is to explicitly cast an operand to unsigned int, which works fine for unsigned char, unsigned short, uint8_t and uint16_t. Avoid using this particular solution on any operand of type uint32_t (or any even larger fixed width type), since unsigned int has an implementation defined size (of at least 16 bits) and this size could be smaller than uint32_t on some systems – potentially resulting in an undesired narrowing cast.

For C++, there’s a fairly good solution. You can use the following helper class to get a safe type that you can use as a safe destination type for explicit casts on your unsigned (or generic) integer types during mathematical operations and comparisons. The explicit casting prevents implicit promotion of unsigned types to int. The effect is that unsigned types smaller than unsigned int will be (manually) promoted to unsigned int, and unsigned types larger than unsigned int will be unchanged. This helper class provides a safe and relatively easy way to achieve well-defined behavior with all unsigned integer types, as we’ll see by example.

#include <limits>

template <class T>
struct safely_promote_unsigned {
   static_assert(std::numeric_limits<T>::is_integer, "");
   static_assert(!std::numeric_limits<T>::is_signed, "");
public:
   // If T can be promoted, then 'type' will be the
   // unsigned version of T's promoted type.
   // Otherwise 'type' will be the same as T.
   using type = decltype(0u + static_cast<T>(0));
};
template <class T>
using safely_promote_unsigned_t =
            typename safely_promote_unsigned<T>::type;

To illustrate the use of safely_promote_t, let’s write a template function version of Figure 3 that is free from any undefined behavior when T is an unsigned integer type:

template <class T>
T unsigned_multiply(T x, T y) 
{
   static_assert(std::numeric_limits<T>::is_integer, "");
   static_assert(!std::numeric_limits<T>::is_signed, "");
   using U = safely_promote_unsigned_t<T>;
   T result = static_cast<U>(x) * static_cast<U>(y);
   return result;
}

Of course the best solution of all came from the introductory advice: use a signed integral type instead of unsigned types whenever you can.

Reference

The C++17 standard has multiple sections that involve integral promotion. For reference, here are the excerpts/summaries from the relevant parts of the C++17 standard draft:

7.6 Integral promotions [conv.prom]
1 A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer conversion rank (7.15) is less than the rank of int can be converted to a prvalue of type int if int can represent all the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned int.

8 Expressions [expr]
11 Many binary operators that expect operands of arithmetic or enumeration type cause conversions […These are] called the usual arithmetic conversions.
[… If neither operand has scoped enumeration type, type long double, double, or float,] the integral promotions (7.6) shall be performed on both operands.

8.3.1 Unary operators [expr.unary.op] (parts 7, 8, 10)
[For the unary operators +, -, ~, the operands are subject to integral promotion.]

8.6 Multiplicative operators [expr.mul]
[Binary operators *, /, %]
2 The usual arithmetic conversions are performed on the operands and determine the type of the result.

8.7 Additive operators [expr.add]
1 The additive [binary] operators + and – group left-to-right. The usual arithmetic conversions are performed for operands of arithmetic or enumeration type.

8.8 Shift operators [expr.shift]
[For the binary operators << and >>, the operands are subject to integral promotion.]

8.9 Relational operators [expr.rel]
[<, <=, >, >=]
2 The usual arithmetic conversions are performed on operands of arithmetic or enumeration type

8.10 Equality operators [expr.eq]
[==, !=]
6 If both operands are of arithmetic or enumeration type, the usual arithmetic conversions are performed on both operands

8.11 Bitwise AND operator [expr.bit.and]
1 The usual arithmetic conversions are performed;

8.12 Bitwise exclusive OR operator [expr.xor]
1 The usual arithmetic conversions are performed;

8.13 Bitwise inclusive OR operator [expr.or]
1 The usual arithmetic conversions are performed;

This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to [C/C++] Surprises and Undefined Behavior From Unsigned Integer Promotion

  1. B.K. says:

    Bjarne Stroustrup and Herb Sutter both give absolutely awful advice. No surprises that they’re designer and advocate of one of the worst languages ever created

    Like

    • hurchalla says:

      They give great advice, but I have mixed feelings on the language and pragmatically sometimes it’s a good choice and sometimes it’s not. For yet another language creator… Dennis Ritchie once called C “quirky, flawed, and an enormous success”. Putting aside the success, C++ has all the quirks and flaws and many many more. And yet it makes a lot of sense to use from a practical point of view for the particular work I do. I’m open to change. Rust is interesting, but it’s a chicken and egg problem where I don’t want to invest time into something that won’t yet have large impact due to few people using it.

      Like

  2. alx says:

    Luckily, C2x will solve the `uint8_t` and `uint16_t` problem with `_BitInt(8)` and `_BitInt(16)`, which don’t promote to `int` automatically.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s