Improving performance

We'll look at two performance tweaks for the Power3 class.

First, a better algorithm. Then, a better algorithm combined with memoization, which involves a cache; therefore, the function becomes stateful. This is where callable objects shine.

The first modification is to use a Divide and Conquer design strategy. The previous version chopped Improving performance into n steps; the loop carried out n individual multiplication operations. If we can find a way to split the problem into two equal portions, the problem decomposes into Improving performance steps. Given pow1(2,1024), the Power1 callable performs the calculation 1024 multiplications by 2. We can optimize this down to 10 multiplications, a significant speedup.

Rather than simply multiplying by a fixed value, we'll use the "fast exponentiation" algorithm. It uses three essential rules for computing Improving performance, as follows:

  • If Improving performance:Improving performance, the result is simply 1.
  • If n is odd and Improving performance, the result is Improving performance. This involves a recursive computation of Improving performance. This still does a multiplication but not a real optimization.
  • If n is even and Improving performance, the result is Improving performance. This involves a recursive computation of Improving performance. This chops the number of multiplications in half.

The following is the recursive callable object:

class Power4( abc.Callable ):
    def __call__( self, x, n ):
        if n == 0: return 1
        elif n % 2 == 1:
            return self.__call__(x, n-1)*x
        else: # n % 2 == 0:
            t= self.__call__(x, n//2)
            return t*t

pow4= Power4()

We applied the three rules to the input value. If n is zero, we'll return 1. If n is odd, we'll make a recursive call and return Improving performance. If n is even, we'll make a recursive call and return Improving performance.

The execution time is dramatically faster. We can use the timeit module to see the difference in performance. See Some Preliminaries, for information on using timeit. When we compare running pow1(2,1024) and pow4(2,1024) 10,000 times, we'll see something like 183 seconds for the previous version versus 8 seconds for this version. We can do better, however, with memoization.

The following is how we can gather performance data using timeit:

import timeit
iterative= timeit.timeit( "pow1(2,1024)","""
import collections.abc
class Power1( collections.abc.Callable ):
    def __call__( self, x, n ):
        p= 1
        for i in range(n):
            p *= x
        return p

pow1= Power1()
""", number=100000 ) # otherwise it takes 3 minutes
print( "Iterative", iterative )

We imported the timeit module. The timeit.timeit() function will evaluate a given statement in the defined context. In this case, our expression is the simple pow1(2,1024) expression. The context for this statement is the definition of the pow1() function; it includes the import, class definition, and creation of the instance.

Note that we provided number=100000 to speed things up. If we had used the default value for the number of iterations, it could have taken almost 2 minutes.