A couple months ago I decided to implement the fundamental arithmetic operations for integers with arbitrary precision. The main reason was mostly curiosity than need, since there are plenty of good libraries focused on this. Still, I thought it would be worth the exercise and the fun.
Originally I was after something quite simplistic, but along the way the code started gaining more form and it eventually became a C++ class with well defined API. Basically, the logical connections were prompting, one thing led to another, and in the end I implemented more than I actually planned.
The reference I used was the volume for Seminumerical Algorithms from Knuth‘s book The Art of Computer Programming. It’s cool that although the book does worry about giving implementation tips, there still room for many questions that appear in the process of converting the algorithmic descriptions into the final C++ code.
Interesting things I observed…
- Prior to anything else, the first step was to think about which model should the representation rely on. Well, in the binary world the common option is two’s complement. However, in an arbitrary precision case from a higher level the numeric base is typically a big integer, so the sign and magnitude approach seems to be a good fit.
- Considering sign and magnitude, there’s still a flexible aspect which is how exactly to handle the zero. Explicitly qualify it or use only the two signs and infer it from the values? I found it simpler and more consistent to have an extra attribute which tells whether the number is positive, negative, or the zero.
- Which data-structure and where to place the most significant digits? Normally a vector is a good option. It provides random access, it’s possible to control allocation with certain predictability, no space overhead, memory is contiguous and there’s a “natural” mapping on how we do operations. Regarding the position of the digits, placing the most significant ones at the end of the vector is the conventional way, since it allows the integer to increase/decrease in an efficient manner.
- Following on the paragraph above, there’s the need to eliminate leading zeros which might result from some operations. This easies the implementation in general. Besides that, because the zero is not represented only through the magnitude I had to add a few “artificial” verifications after some calculations.
- Addition, subtraction, and multiplication are quite intuitive. Nothing really elaborate, the only care needed is when manipulating the carry. On the other hand, the division with remainder is a different story. Obviously you can go for an algorithm of consecutive subtractions, but I wanted something a bit fancier. The algorithm presented on Knuth’s book is based on normalizing the divisor and dividend and on the estimation of a value for the quotient. Some details are non-trivial and I would say there’s a potential for errors in the implementation. ;-) Note: There are two corrections from the algorithm available in the errata.
- Going out of the plan… It turns out that for the division with remainder I had to implement the shift left/right operations. They are recommended for the normalization phase. The other operations which are a natural evolution for the sake of usability are the conversions between the base representation and the decimal base. This is in order to make construction and printing of the integers easy through a string.
- Testing, of course. After all you need to make sure somehow that things are working as expected. Since in this case we are dealing with arbitrary precision you need an auxiliary tool. For instance a powerful calculator or maybe another big integer library.
Well, I had already published the code a while ago, but only now I decided to blog about it. Oh, and take a look at the notes…
You can find it here: vlint (Very Long Integer)
Tchau!
Leandro Melo