As the previous post showed, it’s possible to correctly implement the Extended Euclidean Algorithm using one signed integral type for all input parameters, intermediate variables, and output variables. None of the calculations will overflow. The implementation was given as follows:

```
template <class T>
void extended_euclidean(const T a, const T b, T* pGcd, T* pX, T* pY)
{
static_assert(std::numeric_limits<T>::is_integer, "");
static_assert(std::numeric_limits<T>::is_signed, "");
assert(a >= 0 && b >= 0); // precondition
T x0=1, y0=0, a0=a;
T x1=0, y1=1, a1=b;
while (a1 != 0) {
T q = a0/a1;
T a2 = a0 - q*a1;
T x2 = x0 - q*x1;
T y2 = y0 - q*y1;
x0=x1; y0=y1; a0=a1;
x1=x2; y1=y2; a1=a2;
}
*pX = x0;
*pY = y0;
*pGcd = a0;
}
```

*Figure 1: C++ Extended Euclidean Function, Signed Integer Inputs.*

The precondition **a >= 0 && b >= 0** is required because the proof from the last post didn’t consider any cases where **a** or **b** was negative. I haven’t investigated whether the Extended Euclidean algorithm might be valid for any and all cases of negative inputs, so I won’t speculate about what might be possible for those cases. Instead let’s focus on the proven case **a >= 0 && b >= 0**.

It seems a bit wasteful to use signed integers for inputs **a** and **b** when the function precondition disallows them from ever being negative. Why not just use unsigned integer inputs that have the same precision as the signed integer inputs would have had, and double the permissible input range? If we need more persuasion, pretty likely our primary application for the Extended Euclidean Algorithm is to implement the modular multiplicative inverse. It’s common to use unsigned integral types for modular arithmetic. Why not increase compatibility by keeping to that convention?

It turns out it’s very much possible to correctly implement a modified version of the algorithm using unsigned inputs. You can skip directly to the implementation code or to the optimized version of that code if you like.

## Adapting the Algorithm to Use Unsigned Inputs

Changing the inputs in Figure 1 to unsigned types seems like it might be dangerous – we might get an overflow in one of the variables or calculations inside the function. The last post proved bounds for those variables and calculations, and for an all-signed implementation it nicely ruled out overflow. Those bounds will guide us now in changing Figure 1 to safely use unsigned inputs.

From the bounds proven in the last post, we know that for every loop iteration in Figure 1,

`0 <= q <= max(a,b)`

`0 <= a2 < max(a,b)`

Assuming we go ahead and change the type for inputs **a** and **b** to be unsigned, the bounds above make it clear that in order to be certain to avoid overflow we need to change **q** and **a2** to also use that same unsigned type. Looking at Figure 1, we can also see that the contents of **a2** propagate to **a1**, **a0**, and ***pGcd**. As a result, we need to change all those variables to use that same unsigned integer type.

The more problematic variables are **x2** and **y2** (and likewise **x1**, **y1**, **x0**, **y0**, ***pX**, and ***pY**, since the values of **x2** and **y2** propagate to them).

Displaying more of the bounds proven in the last post, throughout every loop iteration in Figure 1 we know that

`abs(x2) <= b`

`abs(y2) <= max(1,a)`

On the one hand, we can see that **x2** and **y2** must remain signed types, since the bounds above don’t prevent them from taking on negative values. But on the other hand, the bounds above allow them to become as large as **a** or **b**, and because **a** and **b** are now an unsigned type, the largest possible value for **a** or **b** would overflow the corresponding signed type (which we determined **x2** and **y2** must use). Fortunately we can resolve this conflict, by using less generally applicable but tighter bounds for these variables. Let’s refer to the second set of proven bounds from the last post. For every loop iteration other than the final iteration (when **a2** == 0), we know the following tighter bounds hold:

`abs(x2) <= max(1,b/2)`

`abs(y2) <= max(1,a/2)`

Superficially, it seems like these bounds should be enough to guarantee we can use signed types for **x2** and **y2** without any problems, since **b/2** and **a/2** will always fit in their corresponding signed type. The only loop iteration not covered by these bounds is the final iteration, and in Figure 1 we can see that the calculated values of **x2** and **y2** on that last iteration are never used anywhere. So seemingly overflow (if it happened) would be fine at that point. But in truth it’s unacceptable, because overflow of a signed integral type is undefined behavior in C and C++. Upon overflow on the last iteration, a reasonable possibly is everything would be okay, but another reasonable possibility is a trap would occur that crashes the program. Anything is allowed after invoking undefined behavior, so in the end it’s hopeless to try to reason about whether the program would be okay. The only safe thing to do is to view any undefined behavior as a fatal bug that must be fixed.

Since we can not allow undefined behavior, we must ensure signed integer overflow is completely impossible in the function. Fortunately this is easy – we can simply test if we are in the final loop iteration, through an early check of the loop conditional, and break out if so. Most of the work in the final iteration is irrelevant since it’s never used later. The only exception is the reassignments (**x0=x1; y0=y1; a0=a1**), which still need to be done before causing a break.

We’re not quite finished, but let’s incorporate all the changes discussed so far, so that we have an *in-progress* implementation:

```
template <class S, class U>
void extended_euclidean(const U a, const U b, U* pGcd, S* pX, S* pY)
{
static_assert(std::numeric_limits<S>::is_integer, "");
static_assert(std::numeric_limits<S>::is_signed, "");
static_assert(std::numeric_limits<U>::is_integer, "");
static_assert(!(std::numeric_limits<U>::is_signed), "");
static_assert(std::is_same<std::make_signed<U>::type, S>::value, "");
S x0=1, y0=0;
U a0=a;
S x1=0, y1=1;
U a1=b;
while (a1 != 0) {
U q = a0/a1;
U a2 = a0 - q*a1;
if (a2 == 0) {
x0=x1; y0=y1; a0=a1;
break;
}
S x2 = x0 - q*x1;
S y2 = y0 - q*y1;
x0=x1; y0=y1; a0=a1;
x1=x2; y1=y2; a1=a2;
}
*pX = x0;
*pY = y0;
*pGcd = a0;
}
```

*Figure 2: In-Progress (And Broken) Implementation, Unsigned Integer Inputs.*

Figure 2 has a hidden problem that we’ll need to fix. We determined above that **q** must be unsigned and **x1** must be signed, and indeed Figure 2 reflects this. As a result, during the calculation of **(q*x1)**, the compiler will implicitly convert **x1** to the unsigned type of **q** in order to carry out the calculation. Quoting from the C++17 standard section 8 [expr] (11.5.3):

*“Many binary operators that expect operands of arithmetic type cause conversions… The following rules shall be applied… if the operand that has unsigned integer type has rank equal to the rank of the type of the other operand, the operand with signed integer type shall be converted to the type of the operand with unsigned integer type.”*

[The C standard specifies very similar behavior.]

This is a problem – **x1** may legitimately hold a negative value, and yet the compiler will be implicitly converting it to an unsigned integer type! Converting a negative integer value to an unsigned type is well defined in the C and C++ standards, but it rarely does what a programmer wants when it’s unintentional. This conversion certainly wasn’t planned in Figure 2. For the multiplication **(q*x1)**, what we really wanted was to convert the unsigned **q** variable to signed, and to leave **x1** signed. Fortunately we can make this happen without much trouble by explicitly casting **q** to signed: for example, **static_cast<S>(q)**. This explicit cast will also ensure that the compiler will perform no implicit conversions on **x1**, leaving it as signed, since the types of **q** (after the cast) and **x1** will match.

The exact same problem we just discussed exists in Figure 2 for the interim calculation of **(q*y1)** too, and can be solved again with an explicit cast. No casts or changes are needed for the interim calculation **(q*a1)** – it has no problem with conversions since **q** and **a1** both have the same unsigned type.

Explicitly casting **q** to its corresponding signed integral type works well in the code of Figure 2. We know from the bounds proven in the last post that on every loop iteration except the final iteration,

`0 <= q <= max(a,b)/2`

It follows that for all loop iterations other than the final iteration, **q** will be less than or equal to the largest possible unsigned value divided by two. Thus, for those iterations, the value of **q** can be represented in the corresponding signed type, which means the explicit cast of **q** to that signed type will not overflow. On the final iteration we can see that Figure 2 has a break which occurs before it gets to the only interim calculations (**q*x1** and **q*y1**) which require the explicit cast of **q**. And so no casts of **q** will take place on the final loop iteration. Thus none of the casts of **q** will ever overflow the signed result type on any loop iteration.

Let’s incorporate the explicit casts of **q**, which will fix the code from Figure 2.

## A Correct Implementation

```
template <class S, class U>
void extended_euclidean(const U a, const U b, U* pGcd, S* pX, S* pY)
{
static_assert(std::numeric_limits<S>::is_integer, "");
static_assert(std::numeric_limits<S>::is_signed, "");
static_assert(std::numeric_limits<U>::is_integer, "");
static_assert(!(std::numeric_limits<U>::is_signed), "");
static_assert(std::is_same<std::make_signed<U>::type, S>::value, "");
S x0=1, y0=0;
U a0=a;
S x1=0, y1=1;
U a1=b;
while (a1 != 0) {
U q = a0/a1;
U a2 = a0 - q*a1;
if (a2 == 0) {
x0=x1; y0=y1; a0=a1;
break;
}
S x2 = x0 - static_cast<S>(q)*x1;
S y2 = y0 - static_cast<S>(q)*y1;
x0=x1; y0=y1; a0=a1;
x1=x2; y1=y2; a1=a2;
}
*pX = x0;
*pY = y0;
*pGcd = a0;
}
```

*Figure 3: Correct Implementation, Unsigned Integer Inputs.*

At this point, we have correct code in Figure 3. The remainder of this section proves that none of the interim calculations in Figure 3 can overflow, which finishes the code analysis and lets us optimize.

The interim calculations are **q*a1**, **q*x1**, and **q*y1**. Using the proven bounds from the last post, we can see that for every loop iteration the following bound holds:

`0 <= (q*a1) <= max(a,b)`

And for every loop iteration other than the final iteration (when **a2** == 0), the following bounds hold:

`abs(q*x1) <= max(1,b/2)`

`abs(q*y1) <= max(1,a/2)`

If the result of **q*a1** is representable in unsigned type U, then we know it does not overflow. The bound **0 <= (q*a1) <= max(a,b)** shows that this requirement is always satisfied, because **max(a,b)** is less than or equal to **std::numeric_limits<U>::max()**.

Similarly, if the result of **static_cast<S>(q)*x1** is representable in signed type S, then it does not overflow. We’ll need to analyze two cases: the case where this result is non-negative, and the case where it is negative. Let’s first define the constants **u_max = std::numeric_limits<U>::max()**, **s_max = std::numeric_limits<S>::max()**, and **s_min = std::numeric_limits<S>::min()**. The C/C++ standards effectively mandate that **u_max/2 <= s_max** and **-u_max/2 >= s_min**.

Let’s consider first the case where **0 <= static_cast<S>(q)*x1**. Repeating the bound given above, we know that for all loop iterations other than the final,

`abs(q*x1) <= max(1,b/2)`

For this case we can remove the absolute value from the bound, so that we have

`0 <= static_cast<S>(q)*x1 <= max(1,b/2)`

The input parameter **b** has unsigned type U, so we can expand this to

`0 <= static_cast<S>(q)*x1 <= max(1,b/2) <= u_max/2 <= s_max`

or in simplified form,

`0 <= static_cast<S>(q)*x1 <= s_max`

which shows us that for this case, **static_cast<S>(q)*x1** will always be representable in type S, provided that we are not in the final loop iteration.

Let’s now consider the other case, where **0 > static_cast<S>(q)*x1**. We can remove the absolute value from the bound, giving us for this case:

`-static_cast<S>(q)*x1 <= max(1,b/2)`

and thus

`0 > static_cast<S>(q)*x1 >= -max(1,b/2)`

The input parameter **b** has unsigned type U, so we can expand this to

`0 > static_cast<S>(q)*x1 >= -max(1,b/2) >= -u_max/2 >= s_min`

which shows us that for this case, **static_cast<S>(q)*x1** will always be representable in type S, provided that we are not in the final loop iteration.

Since in both cases **static_cast<S>(q)*x1** is always representable in type S, we know that this multiplication will not overflow, so long as we are not in the final loop iteration. If we are in the final loop iteration, Figure 3 shows that we will break out of the loop before reaching this calculation, and so no problems will be possible on the final loop iteration. The same reasoning as above can be applied to show that no overflow will occur in Figure 3 for the multiplication of **static_cast<S>(q)*y1**.

We’ve shown that none of the calculations or conversions in Figure 3 will overflow, and we can conclude that the implementation in Figure 3 is probably correct. You can see the Conclusion for a test project if you wish to run it.

The one fairly minor complaint we might have with Figure 3 is that the conditional branch inside the loop isn’t ideal from a performance standpoint. It would be nice if we could remove the conditional branch. Indeed we can, by re-arranging the code.

## Optimizing the Code

We can remove the conditional branch inside the loop of Figure 3 by rotating statements from the bottom of the loop to the top of the loop. Each rotation will require changing the code that precedes and follows the loop (and sometimes the loop condition too), so that the meaning of the function stays the same. Aside from the rotation, nothing inside the loop should change. The end goal of the rotations is to get the position of the nested conditional branch to be at the end of the loop, so that we can merge it into the loop condition.

Let’s first rotate the reassignments from the end of the loop in Figure 3 to the top of the loop. [If this description is hard to follow, it may help to compare Figure 3 with Figure 4.] The rotation affects the code after the loop, since **x0**, **y0**, and **a0** won’t be reassigned upon the end of the final loop iteration. So to compensate, ***pX**, ***pY**, and ***pGcd** will now need to be set to the values of **x1**, **y1**, and **a1**. This change to ***pX**, ***pY**, and ***pGcd** will need to also be valid if the loop is never taken at all, so we’ll need to change the initial values of **x1**, **y1**, and **a1** to the initial values in Figure 3 given to **x0**, **y0**, and **a0**. Fortunately, since we will have already done the rotation that places the loop reassignments at the top of the loop, this last change also ensures the correct values will get reassigned upon the first entry to the loop. At least in part. We haven’t discussed a second part of the effect of having the reassignments at the top of the loop: the reassignments use **x2**, **y2**, and **a2**, which are variables that haven’t been declared or initialized yet. We’ll fix this by adding declarations and initializations to the start of the function for **x2**, **y2**, and **a2**, setting them to the initial values that had formerly been given to **x1**, **y1**, and **a1** in Figure 3. As a result, upon first entry of the loop, the reassignments at the top of the loop will re-set the correct initial values, leaving the meaning of the function unchanged.

There’s one last detail. We’ll need to change the loop condition to refer to **a2** instead of **a1**, again due to the rotation of the reassignment statements. Since we’ll have already declared and initialized **a2** at the top of the function (to the initial value formerly used in Figure 3 by **a1**), this preserve the meaning of the function perfectly.

It’s easiest to see all this in actual code. By rotating the loop’s reassignments in Figure 3, we’ll have changed the implementation to the following code:

```
template <class S, class U>
void extended_euclidean(const U a, const U b, U* pGcd, S* pX, S* pY)
{
static_assert(std::numeric_limits<S>::is_integer, "");
static_assert(std::numeric_limits<S>::is_signed, "");
static_assert(std::numeric_limits<U>::is_integer, "");
static_assert(!(std::numeric_limits<U>::is_signed), "");
static_assert(std::is_same<std::make_signed<U>::type, S>::value, "");
S x1=1, y1=0;
U a1=a;
S x2=0, y2=1;
U a2=b;
while (a2 != 0) {
S x0=x1;
S y0=y1;
U a0=a1;
x1=x2; y1=y2; a1=a2;
U q = a0/a1;
a2 = a0 - q*a1;
if (a2 == 0)
break;
x2 = x0 - static_cast<S>(q)*x1;
y2 = y0 - static_cast<S>(q)*y1;
}
*pX = x1;
*pY = y1;
*pGcd = a1;
}
```

*Figure 4: Optimization Step 1, Unsigned Integer Inputs.*

We haven’t yet removed the conditional branch inside the loop, but we’re closer to the goal. We still need to rotate the calculations of **x2** and **y2** from the bottom of the loop to the top, and again fix the code before and after the loop to preserve the meaning of the function. As it turns out, after rotating the calculations of **x2** and **y2** there’s no need to change any code after the function. But now that the calculations are at the top of the loop, the variable **q** gets used before it has been declared or initialized. In order to preserve the meaning of the function, we’ll need to add a declaration and initialization of **q** at the start of the function. Initializing **q** to zero works quite well. On the first entry of the loop, the calculations of **x2** and **y2** will thereby contain a multiplication by zero (since **q** == 0), effectively reducing those calculations on the first iteration to reassignments from **x0** and **y0**. This means we’ll need to add to the start of the function declarations and initializations for **x0** and **y0**, setting them to the initial values that Figure 4 had used for **x2** and **y2**.

After making these changes to Figure 4, we have the following code:

```
template <class S, class U>
void extended_euclidean(const U a, const U b, U* pGcd, S* pX, S* pY)
{
static_assert(std::numeric_limits<S>::is_integer, "");
static_assert(std::numeric_limits<S>::is_signed, "");
static_assert(std::numeric_limits<U>::is_integer, "");
static_assert(!(std::numeric_limits<U>::is_signed), "");
static_assert(std::is_same<std::make_signed<U>::type, S>::value, "");
S x1=1, y1=0;
U a1=a;
S x0=0, y0=1;
U a2=b;
U q=0;
while (a2 != 0) {
S x2 = x0 - static_cast<S>(q)*x1;
S y2 = y0 - static_cast<S>(q)*y1;
x0=x1;
y0=y1;
U a0=a1;
x1=x2; y1=y2; a1=a2;
q = a0/a1;
a2 = a0 - q*a1;
if (a2 == 0)
break;
}
*pX = x1;
*pY = y1;
*pGcd = a1;
}
```

*Figure 5: Optimization Step 2, Unsigned Integer Inputs.*

We can now get rid of the conditional branch at the end of the function, since it’s redundant. Making this change and cleaning up the code a little, we get the following optimized function for the Extended Euclidean Algorithm with unsigned integer inputs:

## An Optimized Implementation

```
template <class S, class U>
void extended_euclidean(const U a, const U b, U* pGcd, S* pX, S* pY)
{
static_assert(std::numeric_limits<S>::is_integer, "");
static_assert(std::numeric_limits<S>::is_signed, "");
static_assert(std::numeric_limits<U>::is_integer, "");
static_assert(!(std::numeric_limits<U>::is_signed), "");
static_assert(std::is_same<std::make_signed<U>::type, S>::value, "");
S x1=1, y1=0;
U a1=a;
S x0=0, y0=1;
U a2=b, q=0;
while (a2 != 0) {
S x2 = x0 - static_cast<S>(q)*x1;
S y2 = y0 - static_cast<S>(q)*y1;
x0=x1; y0=y1;
U a0=a1;
x1=x2; y1=y2; a1=a2;
q = a0/a1;
a2 = a0 - q*a1;
}
*pX = x1;
*pY = y1;
*pGcd = a1;
}
```

*Figure 6: Optimized Implementation, Unsigned Integer Inputs.*

## Conclusion

The Extended Euclidean algorithm can be correctly and very efficiently implemented using unsigned integer inputs and signed integer outputs, so long as all the function parameters have the same precision. Figure 3 provides the basic implementation and Figure 6 provides the optimized implementation. There is a header file with the complete implementation, and a CMake file that will create a test project for the implementation if you wish to run it.

Hello!

I asked a question about modular inverses and unsigned integers on StackOverflow [1] and was pointed to your article.

Please, if I wanted to calculate the modular inverse, i.e. unsigned pX, would it be possible to get rid of `static_cast<S>` in `S x2 = x0 – static_cast<S>(q)*x1;` ?

In other words, for modular inverses, would it be possible to use only `U`?

And if that was possible, how would the final `if (x1 < 0) x1 += b0;` for correcting the result look like?

Thank you!

[1] https://stackoverflow.com/questions/53560302/modular-inverses-and-unsigned-integers

LikeLike

Hello!

I asked a question about modular inverses and unsigned integers on StackOverflow [1] and was pointed to your article.

Please, if I wanted to calculate the modular inverse, i.e. unsigned pX, would it be possible to get rid of `static_cast<S>` in `S x2 = x0 – static_cast<S>(q)*x1;` ?

In other words, for modular inverses, would it be possible to use only `U`?

And if that was possible, how would the final `if (x1 < 0) x1 += b0;` for correcting the result look like?

Thank you!

[1] https://stackoverflow.com/questions/53560302/modular-inverses-and-unsigned-integers

LikeLike

Hi Ecir,

The simplest way to implement the modular multiplicative inverse using only unsigned inputs and outputs would be to call the unsigned extended euclidean function from this post – you can use either the code of Figure 6 (the optimized version) or Figure 3 (the unoptimized but more straightforward version). You shouldn’t need to change that function code at all. More specifically, answering your first question, I don’t think you need to make any changes to the unsigned extended euclidean function that would get rid of the internal signed variables or internal signed casts. (Getting rid of the signed variables and casts probably isn’t even possible).

After calling the unsigned input extended euclidean function, in order to get a result that is >= 0, you’ll need to add the modulus to the result if the result would otherwise be less than 0. The addition won’t have any problems with overflow. Without going into detail on the C/C++ standards, the addition will work like you want even though it has signed and unsigned addends.

I haven’t compiled or tested the following code yet, but I expect it is fine and it should provide you an easy implementation of the modular multiplicative inverse, using all unsigned inputs and outputs.

I hope this helps. Thanks for your comment, If I have some time in the next few days, I’ll write a blog post here with details on calculating the modular multiplicative inverse and why it works.

LikeLike

Thank you very much for you answer! Please, one more question: wouldn’t it work if I was using only unsigned integers in the main loop and only at the end I would do something like `if (static_cast

~~(y1) < 0) y1 += modulus`? And thank you for the article, it was very informative!~~LikeLike

Thank you very much for you answer! Please, one more question: wouldn’t it work if I was using only unsigned integers in the main loop and only at the end I would do something like `if (static_cast(y1) < 0) y1 += modulus`? And thank you for the article, it was very informative!

(Sorry, the previous reply probably got formatted badly.)

LikeLike

Thank you very much for you answer! Please, one more question: wouldn’t it work if I was using only unsigned integers in the main loop and only at the end I would do something like `if (static_cast<S>(y1) < 0) y1 += modulus`? And thank you for the article, it was very informative!

(Sorry!! The comments seems to be eating the < sign!)

LikeLike

That’s an interesting question. Though I should mention first for the sake of simplicity and code that is easy to understand, I definitely believe the blog post’s unsigned extended euclidean function is your best solution, using code similar to my last comment reply. Subtle changes tend to invite bugs, and that same subtlety makes maintenance tricky!

Still… if we put that aside and pursue just a curiosity of what might be possible, it’s very interesting to try. I believe you can implement the modular multiplicative inverse using all unsigned types. The extended euclidean algorithm unfortunately requires some variables to hold signed values. But we can work-around this requirement by representing a signed value within an unsigned type: we would just need to interpret that (unsigned type) variable’s bit pattern as a two’s compliment signed value. Throughout all the arithmetic operations, there would be no changes needed despite using these now unsigned types for signed values. But when we initialize those variables we would need to do a manual conversion step into this representation, and we would need to do a manual conversion step out of this representation when we want to convert one of those variables back to a normally-interpreted type like one of the standard unsigned or signed integral types. Converting into this representation is easy – we just do a normal cast to the unsigned type – for example, int x = -1; unsigned int y = static_cast<unsigned int>(x). The C/C++ standards essentially require that this conversion results in a two’s complement bit representation. Converting out of this representation is slightly harder. In our particular application of the modular multiplicative inverse, we will want to convert out of this representation and into just a normally interpreted unsigned type. So long as at the point of the conversion the variable represents a positive or zero value, we can simply change how we interpret the variable (we no longer view it as two’s complement and instead view it as a normal unsigned value) – we don’t even need to do a cast, since the source and destination types are both the same unsigned type. At the point of conversion, the variable can not be allowed to be represent a negative value though, since in that case there would be no way to convert out of our representation and into a normally-interpreted unsigned type.

As you noted, and as I said in my earlier reply too, code for the modular multiplicative inverse should add the modulus to the result when the result is negative, and this will achieve a positive or zero value for the final result – which we would return from the function. Since we have a positive or zero value final result, if we were using the two’s complement interpretation I’ve discussed, at this point we would be safe to convert into normally-interpreted unsigned type! The only question we need to answer is how do we know if the (two’s complement representation) result variable is negative, so that we can decide if we need to add the modulus? The actual C/C++ type of that result variable is unsigned despite our interpretation of it, so obviously comparing that variable for less than zero wouldn’t do what we want. Unsigned variables are never less than zero, and the compiler doesn’t know anything about our unusual two’s complement interpretation of the variable. In order to know if our variable represents a negative value, we need to check if the the first bit of the variable is a 1. We can do this by shifting right by one less than the bit depth of our type, and then checking if the shifted value == 1. For example, if our type is unsigned int and the bit depth of that type is 32, then we would shift right by 31. And if the shifted value == 1, then we need to add the modulus to our (unshifted) result to get the final non-negative value.

Here’s what this idea looks like when implemented. I tested this following code only very lightly, but nevertheless it passed the basic tests, and the idea seems to work as desired.

LikeLike

Thank you very much yet again, very interesting!

One very last question (I hope!), instead of checking if the highest bit is set with the shift, shouldn’t temporarily casting `inv ` to signed type work as well? I mean, if unsigned int has highest bit set and I cast it into signed int it becomes negative so theoretically I could use `if (static_cast(inv) < 0)`, instead of `if ((inv >> (numbits – 1)) == 1)` or am I mistaken?

The advantage would be saving the shift and just interpreting `pInverseValue` as signed. The only question then is whether the magnitude of `inv` is less than `numbits – 1`, and I think you proved before that indeed it is.

But I understand that now it uses just unsigned types. Thanks once more!

LikeLike

Casting

invto the signed type wouldn’t be absolutely certain to work, since the result of casting a negative value to an unsigned type is implementation defined (the result depends on what the compiler documents it will do). In practice I suspect it would work on all compilers released in the last ten years, but the only way to get a guarantee with implementation defined behavior is to check the documentation of the compiler you are using. However for the upcoming C++20, this year the standards committee approved a change that mandates that all signed integral types must use two’s complement bit representation. I’ve heard that the C standards committee plans to do the same for the next version of C. To be clear I haven’t looked into this, but probably it means a simple cast ofinvwill be guaranteed safe for all C++20 compilers (and for some future version of C).There’s an alternative to checking if

invrepresents a negative value though. The earlier blog post on bounds for the Extended Euclidean Algorithm has a section called Final Bounds, which refers to the combined essential assertion results. One of those results wasn’t mentioned in the blog entry since it was only a step toward the final bounds discussed by the blog entry, but for our purposes now that assertion result can be helpful. It showed that at the end of the function,abs(y0) <= max(1,a/2)The code in my last comment reply was adapted from the Extended Euclidean implementation, so that bound applies to it.

y0corresponds toinv, andacorresponds tomodulusin the code adaptation, so we know that for that code,abs(inv) <= max(1,modulus/2)For this bound to hold, we need to interpret

invas a two’s complement signed value, even though the actual C/C++ type ofinvis unsigned.The bound we just established is useful for us now. It means that if we don’t do any casting or reinterpreting of

inv, and we observe that the actual C/C++ value (of unsigned integral type) that it contains is greater thanmax(1,modulus/2), the only possible explanation (other than a bug in the code) is that the signed two’s complement interpretation forinvmust be a negative value. If it were a positive value, it would have been less than or equal tomax(1,modulus/2), according to the bound. As it turns out, we can just compare tomodulus/2and ignore the max portion, since at the start of the code from the previous reply, there are early returns from the function ifmodulusis equal to 0 or 1. Therefore when we get to the comparison,modulus/2will be greater than or equal to 1, making the max() pointless.Our whole purpose for wanting to know if

invrepresents a negative value was so that we could know if we need to addmodulusto it. Keeping that in mind, we can express this alternative approach in code. The only thing that changes from the code in the previous reply is our conditional in the “if” statement:I didn’t use this alternative in the previous reply, because without an explanation it’s nearly impossible to see that it’s trying to test whether

invrepresents a negative value. It’s additional subtlety added to the code, which I don’t like. As I said before, I’d recommend against this entire approach due to the unnecessary subtlety of it. It’s fun to consider, but I don’t think there’s any good reason to actually use it. Code that’s non-obvious makes hidden bugs even less obvious! For example… the code in my last reply had potential undefined behavior that was really well hidden, and I’ve just now edited that reply to fix the potential UB. I’ll explain the reason why the code previously had potential undefined behavior in another blog post [as a brief explanation: during arithmetic operations C/C++ promotes unsigned integral types to the type int, if the original unsigned type is smaller than int – and type int has undefined behavior if it overflows].LikeLiked by 1 person

Thank you for the very clear explanation!

LikeLike