Quick Tutorial: Python Multiprocessing

I’m going to take a break from webhooks and project updates to give a quick tutorial about multiprocessing. Multiprocessing is very useful when it comes to any sort of networking focused code and most tutorials I found dealt with just that aspect, however it can also be a great way to make code with a lot of loops faster as well as allow for client messaging during scripts with a long execution time. I decided to write a quick tutorial about those aspect to fill this apparent gap.

I won’t go into detail about what exactly multiprocessing is, but I do recommend that you look for more in depth tutorials. The general jist is that multiprocessing allows you to run several functions at the same time. For this tutorial, we are going to use it to make a loop faster by splitting a loop into a number of smaller loops that all run in parallel.

We’re going to start with this sample function. It computes the sum of squares in a list of numbers, but waits 0.1 second between each iteration.

import time

def f(a_list):
    out = 0
    for n in a_list:
        out += n*n
        time.sleep(0.1)

    return out

Now for small lists this is fine, but with dozens or hundreds of elements the run time becomes excessive. One way to make it faster is to break the list up into more manageable chunks and use multiprocessing to run a bunch of shorter loops at the same time. Here is some code to do that.

import time
from multiprocessing import Pool

def f(a_list):
    out = 0
    for n in a_list:
        out += n*n
        time.sleep(0.1)

    return out

def f_mp(a_list):
    chunks = [a_list[i::5] for i in range(5)]

    pool = Pool(processes=5)

    result = pool.map(f, chunks)

    return sum(result)

f_mp takes a list and splits it into five equally sized sublists. Then it sets up a Pool object for multiprocessing and declares that it will run 5 processes at once. Using pool.map(function, list_of_arguments) I have it run function f on the process Pool for each of the arguments in my list_of_arguments. For example, if I had pool.map(f, [l1,l2,l3]) it would run f(l1), f(l2) and f(l3) at the same time and return the list [f(l1), f(l2), f(l3)]. Since my aim is to get a sum of the squares, so I sum over the output list to get my answer.

Running the above with a 50 element list, f takes ~5.4 seconds on my machine, while f_mp takes ~1.4 seconds. This lines up with our expectations: f_mp splits the list into five 10 element chunks and runs f on them concurrently, so it should have 1 second of waiting versus 5 seconds for the 50 iteration version.

Another feature of multiprocessing is that you can use an asynchronous map instead to allow for other commands to occur while waiting for the loop to finish. For example, the code below will print “Running…” to the console each half a second until the loop completes:

import time
from multiprocessing import Pool

def f(a_list):
    out = 0
    for n in a_list:
        out += n*n
        time.sleep(0.1)

    return out

def f_amp(a_list):
    chunks = [a_list[i::5] for i in range(5)]

    pool = Pool(processes=5)

    result = pool.map_async(f, chunks)

    while not result.ready():
        print("Running...")
        time.sleep(0.5)

    return sum(result.get())

In this case, two interesting things are happening. First, the code continues to run after line 18 even though that computation hasn’t actually finished yet. Second, result is now an object that contains details about the set of processes being run. One such detail is whether or not all calls in the map are completed, returned by the .ready() method. So by including while not result.ready() immediately afterwards I am creating a loop that will run until the process is finished. Once it is done, to get the list of outputs I have to use the .get() method.

So there you have a simple intro to multiprocessing. You should now know how to split up loops via multiprocessing to increase efficiency and how to use multiprocessing to add messaging about an ongoing function call. For more, look up tutorials on the threading, multiprocessing and ayncio packages.

This entry was posted in Python, tutorial and tagged , . Bookmark the permalink.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.