python multiprocessing vs. threading programing on multicore – simple example

In previous post (here) I described some approach how I overcame some limitation of Matplotlib when plotting large data sets. I also mentioned that using both cores of processor could further speed up calculation. Now we’ll see some short example how to use multicore and threading. According the python’s documentation, multiprocessing APIs was designed to mimic those of threading. In the next we will see simple examples and some comparison.

Here is some example that uses threading class to add some random number to the list. We create two threads (thread ‘A’ and ‘B’) and run them. Each thread let us know, when finished.

#!/usr/bin/python
import random
import threading

times = 10000000

def process(count, jobid, output):
  pom = []
  for i in range(count):
  pom.append(random.random())
  print "Job ", jobid, " finished!"

out1 = list()
out2 = list()

thread1 = threading.Thread(target=process(times, 'A', out1))
thread2 = threading.Thread(target=process(times, 'B', out2))

job = []
job.append(thread1)
job.append(thread2)

for i in job:
  i.start()
for i in job:
  i.join()

print "Finished!"

We created two threads (thread1 and thread2) and assign them target to do. then we run them and wait for result. This time just to write that they are finished. Output on dual-core processor can look like this;

box$ time python test-thread.py
Job  A  finished!
Job  B  finished!
Finished!

real    0m8.175s
user    0m7.558s
sys     0m0.595

Here it took over 8 s to feel the list with 10 000 000 random numbers (twice). Using multiprocessing, we can rewrite the code something like this;

#!/usr/bin/python
import multiprocessing as mp
from multiprocessing import Process
import random

times = 10000000

def process(count, jobid, output):
  pom = []
  for i in range(count):
  pom.append(random.random())
  print "Job ", jobid, " finished!"

out1 = list()
out2 = list()

job = []
job.append(Process(target=process, args=(times, 'A', out1)))
job.append(Process(target=process, args=(times, 'B', out2)))

for j in job:
  j.start()
for j in job:
  j.join()

print "Finished!"

Syntax is pretty similar. We run this code on dual-core processor and output can look like this;

box$ time python test-multiproc.py
Job  B  finished!
Job  A  finished!
Finished!

real    0m4.834s
user    0m7.619s
sys     0m1.001s

Process ‘B’ finished faster then process ‘A’. It is because they run concurrently, whereas in threading case we run threads one after the other. We can see, that consumed user time is about the same; however, the real time it took us to co this job was shorter (8.175s vs. 4.834s). The reason is, that Python have something called Python Global Interpreter Lock (GIL). Even we create multiple threads, they will run just on one core. In case of multiprocessing the load is distributed on more cores.

Time of the multiprocessing case is not exactly half, since there is also some time when we are running just one thread (app itself) and management takes probably also some time. In case of long time running multiprocessing code there can be proximately 50% time save when used two cores except one. So with X cores (X > 1) code should run proximately X-times faster.

I used similar multiprocessing code to calculate rays in ray tracing. Each core got “several” rays to calculate and if there was split, then the child ray was added at the end of the feed in list. To synchronize the read and write to the same list, it is probably better to use Manager, that will take care of that.

from multiprocessing import Process, Manager
...
manager = Manager()
out1 = manager.list()
out2 = manager.list()
...

This way we have managed list, which we can read from and write to safely.

Threading is good enough in many applications. E.g. when we need something to measure and we want application (app window) to be responsive during measurement (measurement can be one thread and app run as other thread). If we need to run the code concurrently, then there is multiprocessing. More information can be found on the Python’s documentation pages or in forums, e.g. http://stackoverflow.com/questions/990102/python-global-interpreter-lock-gil-workaround-on-multi-core-systems-using-tasks.

8 thoughts on “python multiprocessing vs. threading programing on multicore – simple example”

Vladimir July 23, 2011 at 07:03

Martin,
please correct me if I’m wrong, but shouldn’t you be using stuff like assembler, when speed is essential ? You complain about the speed of the code and use script language like python ? Surely it strongly depends on the nature of your computations, but have you ever heard about CUDA technology ? you should give it a try and forget about regular CPUs 😉 (but that’s only my humble opinion) – unless of course it was not give it to you as task, to make python run faster using multiple cores

Martin Post authorJuly 23, 2011 at 07:38

You are right, CUDA would be better, but on the Intel based computer with integrated graphics you can not do much. This was just survey to see, if the algorithm (and equations) are right. It turned out that the bottleneck was not the computation of the paths but the plotting it. Calculation took just 1-2 min finally. Matplotlib (plotting library for python) was not so suitable for this task as it could not handle many many lines. Finally we did not go deeper, but I was already checking the PlayStation 3 as possible (cheep) parallel platform. However, new graphic card was another possible solution, but along with new computer. 🙂

Vladimir September 12, 2011 at 06:22

speaking of which… recently I made some benchmarks with quadcore computers using aircrack software and it turns out, that 60 core farm (which costs about 70k euro) is as fast as 1 top GPU today (which is at most 1k euro)… (yes it depends on nature of problems being solved, but still. I was shocked with this information)

Martin October 25, 2011 at 21:09

Nice price difference … Have seen super computer made of 1500 PlayStations 3. Quite cheep for that power. But now, Sony closed the door.

insider January 23, 2012 at 06:01

Can you explain how to get sorted output of many subprocesses. For example i have following code:
#!/usr/bin/env python


from multiprocessing import Process, Lock

import os
def ping(ip):

  report = ("No response","Partial Response","Alive")

  pingaling = os.popen("ping -q -c2 "+str(ip),"r")

  while 1:

    line = pingaling.readline()

    try:

      result = line[line.find(','):].split()[1]

      output = report[int(result[0])]

    except:

      pass

    if not line: break

  print "Testing %s : %s!" % (ip, output)

if __name__ == '__main__': hosts = ['81.24.212.'+str(x) for x in range(10)] for i in hosts: p = Process(target=ping, args=(i,)) p.start() p.join()

Which is giving following output:
insider@localhost:trunk$ time python simple_multproc2.py Testing 81.24.212.1 : Alive! Testing 81.24.212.2 : Alive! Testing 81.24.212.6 : Alive! Testing 81.24.212.4 : No response! Testing 81.24.212.3 : No response! Testing 81.24.212.0 : No response! Testing 81.24.212.8 : No response! Testing 81.24.212.5 : No response! Testing 81.24.212.9 : No response! Testing 81.24.212.7 : No response!

real 0m11.060s user 0m0.020s sys 0m0.020s
How can i get it sorted by ip using Manager or anything else?

Martin Post authorJanuary 23, 2012 at 12:51

@insider:
Manager is used to manage the access of several processes to the same object so the changes of the object are properly handled.
If process #6 finishes before process #5 then it will log to managed object before the process #5.
I would log all output and then just sort the output. This is just quick solution that came up to my mind right now.

insider January 24, 2012 at 18:39

I have tried both using a global list or another function that appends to list, but at the end i have empty list. The only solution to sort output i’ve found was writing it to file and then sorting it.

Martin Post authorJanuary 25, 2012 at 10:58

Using this piece of code
#!/usr/bin/env python


from multiprocessing import Process, Manager

import time
def ping(num, plist):

	desc = str(num)

	while num:

		plist.append("Process "+desc+" sleep for "+str(num)+" s!")

		time.sleep(num)

		num -= 1
if __name__ == '__main__':

	manager = Manager()

	plist = manager.list()
	tlist = list()
	jobs = []

	for i in range(5):

		jobs.append(Process(target=ping, args=(i,plist)))

	for j in jobs:

		j.start()

	for j in jobs:

		j.join()

for data in plist: print data

the output looks like
Process 1 sleep for 1 s!
Process 3 sleep for 3 s!
Process 2 sleep for 2 s!
Process 4 sleep for 4 s!
Process 2 sleep for 1 s!
Process 3 sleep for 2 s!
Process 4 sleep for 3 s!
Process 3 sleep for 1 s!
Process 4 sleep for 2 s!
Process 4 sleep for 1 s!

if the plist is swapped with tlist the output is empty.
[ jobs.append(Process(target=ping, args=(i,tlist))) and then for data in tlist: ]
Order of the data in the list depends as they were writen by given process.

Citing from documentation:
A manager returned by Manager() will support types list, dict, Namespace, Lock, RLock, Semaphore, BoundedSemaphore, Condition, Event, Queue, Value and Array. http://docs.python.org/library/multiprocessing.html#sharing-state-between-processes

I use list like this to record milion lines of data output; however, I do not care on the order…

www.martinkral.sk

python multiprocessing vs. threading programing on multicore – simple example

8 thoughts on “python multiprocessing vs. threading programing on multicore – simple example”

Leave a Reply Cancel reply