Saturday, September 7, 2013

Computing a Square Root

Most of the time a programmer gives little thought to how mathematical functions like sine, cosine or square root are implemented. In C, the standard library provides an implementation of these functions (which often involves little more than invoking the appropriate hardware instruction), and most other languages simply wrap their API calls for these functions around their C counterparts. Thus, unless the language provides a library that implements these function to an arbitrary precision, the underlying hardware limits the precision of these operations within the programming language. Because of this limitation, one might wonder how these functions are implemented when the precision required is more than what's supported by the hardware. This question might arise most often within the field of cryptography.

Ruby's Implementation

In Ruby, we have the Math module to provide many of the common mathematical functions. Each Math function accepts an argument of class Float, and if the argument is not a Float, a conversion is performed.

>> Math.sqrt(5)
=> 2.23606797749979
>> Math.sqrt(5.0)
=> 2.23606797749979


On my desktop, Ruby's Float is a double-precision value, which has more than sufficient precision for most applications. But, it does have its limits.

>> x = 100
=> 100
>> Math.sqrt(2**x) - 2**(x/2)
=> 0.0
>> x = 1000
=> 1000
>> Math.sqrt(2**x) - 2**(x/2)
=> 0.0
>> x = 2000
=> 2000
>> Math.sqrt(2**x) - 2**(x/2)
=> Infinity
>> Float(2**x)
=> Infinity


Because Ruby's integers have arbitrary precision while double-precision floating point values permit at most 52-bits for significant digits and 11-bits for an exponent (one bit of which is for its sign), the example above "chokes" when the exponent gets too large, i.e.,  > 1024. So how might we compute the square root of a large integer?

One way would be to use the BigDecimal class from Ruby's "standard" library which implements many of these functions for arbitrary precision numbers. But that does not provide us insight into how we might implement a mathematical function like this on our own. For this reason, we will consider an efficient implementation of square root as well as how its implementation can be derived.

The Integer Square Root Problem

We will only consider the "integer square root problem" which asks, given a positive integer $S$, what is the largest integer $x$ such that $x^2 \leq S$. For example, if $S = 15$ then the solution is $x = 3$, since $3^2 = 9 < 15$ and $(3+1)^2 = 16 > 15$. Once we have a solution for the integer square root problem, we can extend it to approximate the square root of a fractional number $r$, by first multiplying $r$ by $2^{2j}$ (which is typically just a bit-shift), computing its integer square root, then dividing by $2^j$.  You can see why this might work from the equation $\sqrt{r} = \left(\sqrt{r2^{2j}}\right)/2^j$. Any desired precision can be obtained by increasing the value of $j$.

Before I get all "academic" about how to solve this problem, let's just look at the resulting implementation. We will discuss the whys and hows below.


The implementation above has a strange loop in it.  It's not at all obvious why that loop works or how someone could derive such a solution. But it does work, and it works rather efficiently:

>> (1..100000).each do |s|
?> x = s.isqrt
>> unless x**2 <= s && (x+1)**2 > s
>> puts "ERROR: #{s} => #{x}"
>> end
>> end
=> 1..100000
>> require 'benchmark'
=> true
>> x = 2000
=> 2000
>> (2**x).isqrt - 2**(x/2)
=> 0
>> puts Benchmark.measure { (2**x).isqrt }
0.000000 0.000000 0.000000 ( 0.007571)
=> nil
>> x = 20000
=> 20000
>> puts Benchmark.measure { (2**x).isqrt }
0.910000 0.000000 0.910000 ( 0.914303)
=> nil


In the irb interaction above, I test that it returns the correct result for the first 100,000 integers. This is by no means a "proof of correctness", but it does provide enough confidence to continue discussing the implementation as if it were certainly correct. Next, I compute the square root of a 2000 bit number, and then the same for a 20000 bit number. (Looking at a few more large examples, it appears the time required grows at a rate close to order $O(x^{2.5 + \epsilon})$, where $x$ is the number of bits. I believe the number of iterations of the loop above is bounded by $O(\sqrt{x})$, but the time required for each iteration increases with the size of the number, making this factor multiplicative with the time required by the most costly arithmetic operation. Here you can see that a factor of 10 increase in the number of bits produced a > 100x increase in the time required. Subsequent doublings seemed to increase the time by a factor of ~7. Note that for a number $S$, the number of bits, $x$, required for its storage is $\leq log_2(S) + 1$.)

Newton's Method

The source code listed above demonstrates a common method of approximating square roots or other "algebraic numbers". It's called Newton's Method, and is one of just a few topics from numerical analysis that the typical computer science (CS) undergraduate will see before graduating (or ever for that matter). It's usually demonstrated during Calculus I, before a CS student is fully aware that computer science is more than just programming.

In general, Newton's method provides a sequence of approximations, each closer to the solution than the previous (although this is not necessarily true). To use this method, we must first obtain a polynomial that has a "root" (or a "zero") at the value we are trying to compute. For a "square root" this is rather simple. Since we want to find the largest $x$ such that $x^2 \leq S$, the polynomial we need is $f(x) = x^2 - S$. We will search for a positive value $x$ such that $f(x) \leq 0$ and $f(x+1) > 0$.

Starting with an initial "guess", $x_0$, the sequence follows from:
\[
x_{k+1} = x_k - \frac{f(x)}{f'(x)}.
\]
The reason this method works cannot be seen by looking at the formula above, but instead by looking at the graph of the function $f(x)$. Consider $(x_0, f(x_0))$ to be the point on the graph that is our initial guess at the solution. Taking the line tangent to the graph at this point and locating where it crosses the $x$-axis, $(x_1, 0)$, we obtain what will be our next guess. Using point-slope form $y - y_0 = m(x - x_0)$ and the fact that the derivative of our function ($f'$) at this this point ($x_0$) gives us the slope, we can then solve for $x_1$. Starting with $y_1 - y_0 = m(x_1 - x_0)$, we apply our substitutions: \[ 0 - f(x_0) = f'(x_0)(x_1 - x_0), \] then \[ -f'(x_0)x_1 = f(x_0) - f'(x_0)x_0, \] and finally, \[ x_1 = x_0 - \frac{f(x_0)}{f'(x_0)}, \] demonstrating a derivation of the sequence above. (The image below is from Wikipedia. It visually demonstrates how this method's successive approximations approach the root.)
NewtonIteration Ani

For the integer square root problem, $f(x_k) = x_k^2 -S$ and $f'(x_k) = 2x_k$. Applying Newton's method we obtain,
\[
x_{k+1} = x_k - \frac{x_k^2 -S}{2x_k}\\
= \frac{x_k^2 + S}{2x_k}\\
= \frac{1}{2}\left(x_k + \frac{S}{x_k}\right).
\]
From this we see how the second line of the loop is derived.
    next_x_k = (x_k + self/x_k) >> 1
The ">> 1" is a bit shift, or equivalently, an integer division by 2, and the initial "guess" that we use is simply the number divided by 2.

To fully verify the correctness of my implementation, I would need to consider the nature of Ruby's integer division (which results in an integer, leaving behind a remainder) and show that this loop does indeed terminate at a value that satisfies the criteria. But none of that is needed (nor would it appropriate in this context) for us to see how Newton's method might be utilized to find an efficient implementation of this and many other mathematical functions for arbitrary precision numbers.

1 comment:

  1. Hi, if you are only interested in square roots, I had better refer to the Babylonian method, which is the same as Newton's but

    a) older
    b) simpler to understand (take two approximate values, compute their mean, rinse repeat)

    Going Newton-Raphson, derivative etc... just for square roots is rather strong *as I see it*.

    Anyway, thanks for the post.


    Pedro.

    ReplyDelete