For any non-trivial code which is considered "slow", I bet at some point one of your engineers has wondered how can we speed this up by executing in parallel or concurrently. Setting aside solutions to this at an infrastructure or system design level (scaling out or mapreduce respectively, for example), this article looks at the options at a code execution level using Python.

Python's 3 "concurrent" paradigms

In the Python world, there are 3 main ways to achieve concurrent code execution (or at least "near-concurrent"):

ApproachSummaryBest For
  • Uses separate processes.
  • Each core in your CPU can then work on tasks concurrently.
  • Spawning processes are typically slower than threading.
  • Each process has its own memory space, meaning global variables in your program are not affected by each process.
CPU bound jobs
  • Uses separate threads.
  • The Global Interpreter Lock (GIL) forces only one thread to be working at a time, but when a thread is "waiting" (eg network I/O) the operating system yields control to another thread to do its work. ie "near concurrent"
  • Faster to start new threads than a new process.
  • Threads share a common memory space, so global variables in your program can be mutated by different threads.
I/O Bound, Fast I/O, Limited Number of Connections
Asyncio / coroutines
  • Asyncio is only available from Python 3.5+.
  • Coroutines are very similar to threads, but allows the engineer to explicitly decide when to yield control to other coroutines (rather than the system doing so for thread).
  • "Near concurrent" on a single thread.
I/O Bound, Slow I/O, Many connection


With multiprocessing, you start multiple processes which are run in their own separate memory space.

The processes can either be spawned (ie a fresh process is started) or forked (the new process is identical to the parent process up to the point of being created). Your start method (either spawning or forking) can be configured to overwrite the default method which differs by operating system.

Since processes each run in their own memory space, the advantage is that you don't have to worry about data corruption and deadlocks, ie the usual problems associated with threading.

However the drawbacks to using multiprocessing are:

  1. Processes are slower to start, so unless it's necessary threading / coroutines might be more appropriate. The start up "fixed cost" makes multiprocessing beneficial if the task run by each process takes substantially longer than the start up time.
  2. Memory usage on your system will likely increase, as each process requires its own memory space

Here's what a simple implementation of multiprocessing looks like:

import multiprocessing

def add_two(number):
    print('Addition:' , number + 2)

def subtract_two(number):
    print('Subtraction:' , number - 2)

if __name__ == "__main__":
    number = 7
    p1 = multiprocessing.Process(target=add_two, args=(number,))
    p2 = multiprocessing.Process(target=subtract_two, args=(number,))

The join() you see here is the conceptual opposite to fork, in that it asks the master process to wait until the child processes have been "joined" back to the master process. Without this, the master process could terminate early, leaving zombie child processes.

If you really needed to, you could also exchange data between processes using queues and pipes.


Threading uses multiple threads in the same process, with each thread sharing the same memory space.

This shared memory space is both the source of threading's main advantage over multiprocessing (faster start!) and its main disadvantage - data synchronisation and the numerous headaches caused when the order of code execution in a function is not guaranteed (ie no more "local reasoning").

Still, threading might be a suitable solution for cases where your code is I/O bound and spends a lot of time waiting for results from a remote source. Like retrieving data from an external API for example, where you could have multiple threads downloading from different API endpoints "near concurrently". 

Note: I mention "near concurrently" because in Python the threads are not all actually working in parallel at all times. In the most common "default" Python (ie CPython), the Global Interpreter Lock is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once. Effectively, this means that threads yield control to other threads at certain points (uh... I/O!) so that overall your tasks are completed faster, but the CPU steps are not actually being processed concurrently.

Here's what a threading implementation might look like:

import threading

def add_two(number):
    print('Addition:', number + 2)
    return (number + 2)

if __name__ == "__main__":
    number = 7
    threads = 2   # Number of threads to create

    for i in range(threads):
        thread = threading.Thread(target=add_two, args=(number,))

    print("List processing complete.")

Asyncio / coroutines

Asyncio is a part of the standard library from Python 3.5 onwards, and uses coroutines to handle similar problems as threading - without the issues of threading of course!

Coroutines are run on single threads, but allow the engineer to stipulate when control is yielded back to the main task. I think of it as threading but you (the engineer) has greater control over when control is yielded, as compared to threading where the system mostly handles this.

Asyncio is therefore useful for situations where you have relatively slow I/O and want to explicitly specify when control is yielded.

Here's what it might look like:

import time
import asyncio

async def add_two(number):
    print('Addition started work: {}'.format(time.time()))
    await asyncio.sleep(5)
    print("Addition: %s" % (number + 2))
    print('Addition ended work: {}'.format(time.time()))

async def subtract_two(number):
    print('Subtraction started work: {}'.format(time.time()))
    await asyncio.sleep(5)
    print("Subtraction: %s" % (number - 2))
    print('Subtraction ended work: {}'.format(time.time()))

if __name__ == "__main__":
    # In Python 3.7+, you would use this instead:

    number = 7

    loop = asyncio.get_event_loop()
    futures = [add_two(number), subtract_two(number)]
    result = loop.run_until_complete(asyncio.wait(futures))


Hopefully this article has served as an introduction to a fairly complex subject in Python programming.

The general consensus for when to use each of these seems to be:

  1. CPU Bound => Multi Processing
  2. I/O Bound, Fast I/O, Limited Number of Connections => Multi Threading
  3. I/O Bound, Slow I/O, Many connections => Asyncio