Summary: in this tutorial, you’ll learn how to run code in parallel using the Python multiprocessing module.
Introduction to the Python multiprocessing
Generally, programs deal with two types of tasks:
- I/O-bound tasks: if a task does a lot of input/output operations, it’s called an I/O-bound task. Typical examples of I/O-bound tasks are reading from files, writing to files, connecting to databases, and making a network request. For I/O-bound tasks, you can use multithreading to speed them up.
- CPU-bound tasks: when a task does a lot of operations using a CPU, it’s called a CPU-bound task. For example, number calculation, image resizing, and video streaming are CPU-bound tasks. To speed up the program with lots of CPU-bound tasks, you use multiprocessing.
Multiprocessing allows two or more processors to simultaneously process two or more different parts of a program.
In Python, you use the multiprocessing
module to implement multiprocessing.
Python multiprocessing example
See the following program:
import time
def task():
result = 0
for _ in range(10**8):
result += 1
return result
if __name__ == '__main__':
start = time.perf_counter()
task()
task()
finish = time.perf_counter()
print(f'It took {finish-start:.2f} second(s) to finish')
Code language: Python (python)
Output:
It took 5.55 second(s) to finish
Code language: Python (python)
How it works.
First, define the
function is a CPU-bound task because it performs a heavy computation by executing a loop for 100 million iterations and incrementing a variable task()
result
:
def task():
result = 0
for _ in range(10**8):
result += 1
return result
Code language: Python (python)
Second, call the task()
functions twice and record the processing time:
if __name__ == '__main__':
start = time.perf_counter()
task()
task()
finish = time.perf_counter()
print(f'It took {finish-start: .2f} second(s) to finish')
Code language: Python (python)
On our computer, it took 5.55 seconds to complete.
Using multiprocessing module
The following program uses the multiprocessing module but takes less time:
import time
import multiprocessing
def task() -> int:
result = 0
for _ in range(10**8):
result += 1
return result
if __name__ == '__main__':
start = time.perf_counter()
p1 = multiprocessing.Process(target=task)
p2 = multiprocessing.Process(target=task)
p1.start()
p2.start()
p1.join()
p2.join()
finish = time.perf_counter()
print(f'It took {finish-start:.2f} second(s) to finish')
Code language: Python (python)
Output:
It took 3.43 second(s) to finish
Code language: Python (python)
How it works.
First, import the multiprocessing module:
import multiprocessing
Code language: Python (python)
Second, create two processes and pass the task function to each:
p1 = multiprocessing.Process(target=task)
p2 = multiprocessing.Process(target=task)
Code language: Python (python)
Note that the Process()
constructor returns a new Process
object.
Third, call the start()
method of the Process
objects to start the process:
p1.start()
p2.start()
Code language: Python (python)
Finally, wait for the processes to complete by calling the join()
method:
p1.join()
p2.join()
Code language: Python (python)
Python multiprocessing practical example
We’ll use the multiprocessing module to resize the high-resolution images.
First, install the Pillow
library for image processing:
pip install Pillow
Code language: Python (python)
Second, develop a program that creates thumbnails of the pictures in the images
folder and save them to the thumbs
folder:
import time
import os
from PIL import Image, ImageFilter
filenames = [
'images/1.jpg',
'images/2.jpg',
'images/3.jpg',
'images/4.jpg',
'images/5.jpg',
]
def create_thumbnail(filename, size=(50,50), thumb_dir ='thumbs'):
# open the image
img = Image.open(filename)
# apply the gaussian blur filter
img = img.filter(ImageFilter.GaussianBlur())
# create a thumbnail
img.thumbnail(size)
# save the image
img.save(f'{thumb_dir}/{os.path.basename(filename)}')
# display a message
print(f'{filename} was processed...')
if __name__ == '__main__':
start = time.perf_counter()
for filename in filenames:
create_thumbnail(filename)
finish = time.perf_counter()
print(f'It took {finish-start:.2f} second(s) to finish')
Code language: Python (python)
On our computer, it took about 4.06 seconds to complete:
images/1.jpg was processed...
images/2.jpg was processed...
images/3.jpg was processed...
images/4.jpg was processed...
images/5.jpg was processed...
It took 4.06 second(s) to finish
Code language: Python (python)
Third, modify the program to use multiprocessing. Each process will create a thumbnail for a picture:
import time
import os
from PIL import Image, ImageFilter
import multiprocessing
filenames = [
'images/1.jpg',
'images/2.jpg',
'images/3.jpg',
'images/4.jpg',
'images/5.jpg',
]
def create_thumbnail(filename, size=(50,50), thumb_dir ='thumbs'):
# open the image
img = Image.open(filename)
# apply the gaussian blur filter
img = img.filter(ImageFilter.GaussianBlur())
# create a thumbnail
img.thumbnail(size)
# save the image
img.save(f'{thumb_dir}/{os.path.basename(filename)}')
# display a message
print(f'{filename} was processed...')
def main():
start = time.perf_counter()
# create processes
processes = [multiprocessing.Process(target=create_thumbnail, args=[filename])
for filename in filenames]
# start the processes
for process in processes:
process.start()
# wait for completion
for process in processes:
process.join()
finish = time.perf_counter()
print(f'It took {finish-start:.2f} second(s) to finish')
if __name__ == '__main__':
main()
Code language: Python (python)
Output:
images/5.jpg was processed...
images/4.jpg was processed...
images/1.jpg was processed...
images/3.jpg was processed...
images/2.jpg was processed...
It took 2.92 second(s) to finish
Code language: Python (python)
In this case, the output shows that the program processed the pictures faster.
Summary
- Use Python multiprocessing to run code in parallel to deal with CPU-bound tasks.