Big Integers: Addition and Subtraction

Preface

I had the idea of implementing my own arbitrary precision arithmetic library a while back when I was trying to calculate $\pi$ on my computer. Back then, I was doing it with Python, which already had language native big integer implementation. However, it did not have native big floating point implementations. Since $\pi$ is a floating point number, I needed some external library that provided such functionality.

I was also trying to learn C++ at the time, I thought that instead of downloading a pre-built library, I would take the challenge of trying to implement the arbitrary precision floating point numbers myself. I decided to implement big integers first since it is the easiest. Since then, I’ve discovered and had the joy of fixing annoying bugs and issues that arose in the process. Let us discuss them in this post.

Initial Idea¹

One approach for big integers is that since all numbers in computers are stored in binary (base 2), we can store any number in a dynamic array of boolean using std::vector<bool>, with the sign of the number stored in a separate bool variable. This way we can resize it to any size to represent any number, without being limited to 32-bits or 64-bits.

For example, we could represent two integers $ x $ and $ y $ in binary via the following vectors of length $n$ and $m$:

$$ x = \lbrace x[0], x[1], \dots, x[n] \rbrace \newline y = \lbrace y[0], y[1], \dots, y[m] \rbrace $$

Each element of the vectors represent a digit of our number in base 2, so $x[i]$ and $y[i]$ can either be $0$ or $1$. We’ll store the vectors in little-endian for convenience purposes (that is, the least significant digit is stored first). Then, the value of $x$ and $y$ in base 10 can be calculated via the following respectively:

$$ \sum_{i = 0}^{n} 2^{i} \cdot x[i] \qquad \sum_{i = 0}^{m} 2^{i} \cdot y[i] $$

With this implementation, the addition algorithm to calculate the answer $a = x + y$ is simple. First, we append zeros to the end of $x$ or $y$ so that their length is equal $\text{max}(n, m)$. Then, for each $0 \le i \le \text{max}(n, m)$, we can calculate the $i$-th index of the sum $a[i]$ using:

$$ a[i] = (x[i] + y[i] + c[i]) \mod 2 \newline c[i + 1] = \bigg\lfloor \dfrac{x[i] + y[i] + c[i]}{2} \bigg\rfloor \quad (c[0] = 0) $$

where $c[i]$ is the $i$-th carry.

Performance Issues

I implemented this idea and decided to try to test its performance. However, I found out that it was really slow. For example, I tried to use repeated addition to calculate $2^{65536}$ with addition via the following code:

BigInt calculate() {
    BigInt n = 1;
    for (i = 0; i < 65536; i++)
        n = n + n; // <- Equivalent to n *= 2
    return n;
}

This implementation chugged along for about 20 seconds before spewing out a number. In comparison, Python’s integer implementation calculated the number in about 150 ms. That is a huge performance difference; clearly there are some work to do before my library can go on to calculate $\pi$.

To try to improve the performance, I speculated that any of the following could be the case that’s causing our implementation’s performance to be awful when compared to Python:

Our addition algorithm is inefficient.
Python was doing some funky optimization by recognizing that we’re computing some integer power using repeated addition and using its fast power algorithm instead.
The std::vector class was storing the boolean in a weird way where each element takes up more than one byte and is therefore causing more computations.

I quickly disproved these speculations:

Our addition algorithm is already $O(n)$, and I didn’t think that could be optimized further.
The addition-to-power thing sounds a bit too advanced for the Python interpreter to detect. Besides, I had the interpreter optimization off anyway.
The C++ standard doesn’t specify boolean to be one bit, but it does specify std::vector<bool> to be optimized to a vector containing one bit boolean.

Python’s Implementation

Without further leads as to why the implementation is so slow, I turned to the internet for clues. It turns out, Python’s internal representation of the big integers is in base $2^{30}$. This means each digit can represent an integer from $0$ to $2^{30} - 1$.

I was confused at first since it’s not normal for one to use a billion as a base in any number system. However, it made sense after a while because the higher the base is, the more bits the dynamic array can store with the same number of elements. For example, for a 90-bit integer, our boolean implementation requires 90 elements in the dynamic array, while Python’s implementation only needs 3 elements. Therefore, to add two 90-bit integers, our implementation requires 90 operations, while Python’s implementation only needs 3 operations.

Since the CPU’s built-in ALU is much faster at operations than us manually looping through an array of boolean, we want each element of the dynamic array to represent as many bits as possible. This is why Python had its base in such a high number.

In terms of addition logic, it wouldn’t change much; instead of taking the modulo of $2$, we would take the modulo of whatever our storage base is. In this case, let’s choose $2^{32}$ as our base (so that each element would be one std::uint32_t):

$$ a[i] = (x[i] + y[i] + c[i]) \mod 2^{32} \newline c[i + 1] = \bigg\lfloor\dfrac{x[i] + y[i] + c[i]}{2^{32}} \bigg\rfloor \quad (c[0] = 0) $$

When I ran the speed test again with the new implementation, our algorithm’s performance superseded Python’s! The speed increase likely came from using 2 more bits per “digit”².

Final Code

Here is the final internal implementation of integer addition for big integers in represented in base $2^{32}$. I also went ahead and implemented the subtraction algorithm as well, which is similar in logic. Note that these functions do not handle the sign of the integer and naively assumes that $a, b \ge 0$ for addition and $a \ge b \ge 0$ for subtraction.

using storage_t = std::vector<uint32_t>;

inline storage_t add_int(const storage_t& a, const storage_t& b) {
    const uint64_t BASE = (1ULL << 32);
    const size_t n = std::max(a.size(), b.size());
    
    uint64_t carry = 0;
    storage_t result(n);

    for (size_t i = 0; i < n; i++) {
        const uint64_t ai = (i < a.size()) ? a[i] : 0;
        const uint64_t bi = (i < b.size()) ? b[i] : 0;
        uint64_t sum = ai + bi + carry;
        result[i] = static_cast<uint32_t>(sum & (BASE - 1));
        carry = (sum >= BASE);
    }

    if (carry)
        result.push_back(1);
    return result;
}

inline storage_t sub_int(const storage_t& a, const storage_t& b) {
    const uint64_t BASE = (1ULL << 32);
    const size_t n = a.size();

    uint64_t borrow = 0;
    storage_t result(n);

    for (size_t i = 0; i < n; i++) {
        uint64_t ai = (i < a.size()) ? a[i] : 0;
        uint64_t bi = (i < b.size()) ? b[i] : 0;
        uint64_t diff = ai - bi - borrow;
        result[i] = static_cast<uint32_t>(diff & (BASE - 1));
        borrow = (ai < (bi + borrow));
    }

    while (!result.empty() && result.back() == 0)
        result.pop_back();
    return result;
}

You can also view the full project on GitHub.

Footnotes

This section is a bit math-notation heavy, which is really just me wanting to try out $\LaTeX$ formatting in Hugo. ↩︎
For actual speed comparisons in seconds, check out the documentation for the library. ↩︎

Have a comment or a question about this post? Reply by email!