In previous post (here) I described some approach how I overcame some limitation of Matplotlib when plotting large data sets. I also mentioned that using both cores of processor could further speed up calculation. Now we’ll see some short example how to use multicore and threading. According the python’s documentation, multiprocessing APIs was designed to mimic those of threading. In the next we will see simple examples and some comparison.
Here is some example that uses threading class to add some random number to the list. We create two threads (thread ‘A’ and ‘B’) and run them. Each thread let us know, when finished.
#!/usr/bin/python import random import threading times = 10000000 def process(count, jobid, output): pom =  for i in range(count): pom.append(random.random()) print "Job ", jobid, " finished!" out1 = list() out2 = list() thread1 = threading.Thread(target=process(times, 'A', out1)) thread2 = threading.Thread(target=process(times, 'B', out2)) job =  job.append(thread1) job.append(thread2) for i in job: i.start() for i in job: i.join() print "Finished!"
We created two threads (thread1 and thread2) and assign them target to do. then we run them and wait for result. This time just to write that they are finished. Output on dual-core processor can look like this;
box$ time python test-thread.py Job A finished! Job B finished! Finished! real 0m8.175s user 0m7.558s sys 0m0.595
Here it took over 8 s to feel the list with 10 000 000 random numbers (twice). Using multiprocessing, we can rewrite the code something like this;
#!/usr/bin/python import multiprocessing as mp from multiprocessing import Process import random times = 10000000 def process(count, jobid, output): pom =  for i in range(count): pom.append(random.random()) print "Job ", jobid, " finished!" out1 = list() out2 = list() job =  job.append(Process(target=process, args=(times, 'A', out1))) job.append(Process(target=process, args=(times, 'B', out2))) for j in job: j.start() for j in job: j.join() print "Finished!"
Syntax is pretty similar. We run this code on dual-core processor and output can look like this;
box$ time python test-multiproc.py Job B finished! Job A finished! Finished! real 0m4.834s user 0m7.619s sys 0m1.001s
Process ‘B’ finished faster then process ‘A’. It is because they run concurrently, whereas in threading case we run threads one after the other. We can see, that consumed user time is about the same; however, the real time it took us to co this job was shorter (8.175s vs. 4.834s). The reason is, that Python have something called Python Global Interpreter Lock (GIL). Even we create multiple threads, they will run just on one core. In case of multiprocessing the load is distributed on more cores.
Time of the multiprocessing case is not exactly half, since there is also some time when we are running just one thread (app itself) and management takes probably also some time. In case of long time running multiprocessing code there can be proximately 50% time save when used two cores except one. So with X cores (X > 1) code should run proximately X-times faster.
I used similar multiprocessing code to calculate rays in ray tracing. Each core got “several” rays to calculate and if there was split, then the child ray was added at the end of the feed in list. To synchronize the read and write to the same list, it is probably better to use Manager, that will take care of that.
from multiprocessing import Process, Manager ... manager = Manager() out1 = manager.list() out2 = manager.list() ...
This way we have managed list, which we can read from and write to safely.
Threading is good enough in many applications. E.g. when we need something to measure and we want application (app window) to be responsive during measurement (measurement can be one thread and app run as other thread). If we need to run the code concurrently, then there is multiprocessing. More information can be found on the Python’s documentation pages or in forums, e.g. http://stackoverflow.com/questions/990102/python-global-interpreter-lock-gil-workaround-on-multi-core-systems-using-tasks.
please correct me if I’m wrong, but shouldn’t you be using stuff like assembler, when speed is essential ? You complain about the speed of the code and use script language like python ? Surely it strongly depends on the nature of your computations, but have you ever heard about CUDA technology ? you should give it a try and forget about regular CPUs 😉 (but that’s only my humble opinion) – unless of course it was not give it to you as task, to make python run faster using multiple cores
You are right, CUDA would be better, but on the Intel based computer with integrated graphics you can not do much. This was just survey to see, if the algorithm (and equations) are right. It turned out that the bottleneck was not the computation of the paths but the plotting it. Calculation took just 1-2 min finally. Matplotlib (plotting library for python) was not so suitable for this task as it could not handle many many lines. Finally we did not go deeper, but I was already checking the PlayStation 3 as possible (cheep) parallel platform. However, new graphic card was another possible solution, but along with new computer. 🙂
speaking of which… recently I made some benchmarks with quadcore computers using aircrack software and it turns out, that 60 core farm (which costs about 70k euro) is as fast as 1 top GPU today (which is at most 1k euro)… (yes it depends on nature of problems being solved, but still. I was shocked with this information)
Nice price difference … Have seen super computer made of 1500 PlayStations 3. Quite cheep for that power. But now, Sony closed the door.
Can you explain how to get sorted output of many subprocesses. For example i have following code:
from multiprocessing import Process, Lock
report = ("No response","Partial Response","Alive")
pingaling = os.popen("ping -q -c2 "+str(ip),"r")
line = pingaling.readline()
result = line[line.find(','):].split()
output = report[int(result)]
if not line: break
print "Testing %s : %s!" % (ip, output)
if __name__ == '__main__':
hosts = ['81.24.212.'+str(x) for x in range(10)]
for i in hosts:
p = Process(target=ping, args=(i,))
Which is giving following output:
insider@localhost:trunk$ time python simple_multproc2.py
Testing 22.214.171.124 : Alive!
Testing 126.96.36.199 : Alive!
Testing 188.8.131.52 : Alive!
Testing 184.108.40.206 : No response!
Testing 220.127.116.11 : No response!
Testing 18.104.22.168 : No response!
Testing 22.214.171.124 : No response!
Testing 126.96.36.199 : No response!
Testing 188.8.131.52 : No response!
Testing 184.108.40.206 : No response!
How can i get it sorted by ip using Manager or anything else?
Manager is used to manage the access of several processes to the same object so the changes of the object are properly handled.
If process #6 finishes before process #5 then it will log to managed object before the process #5.
I would log all output and then just sort the output. This is just quick solution that came up to my mind right now.
I have tried both using a global list or another function that appends to list, but at the end i have empty list. The only solution to sort output i’ve found was writing it to file and then sorting it.
Using this piece of code
from multiprocessing import Process, Manager
def ping(num, plist):
desc = str(num)
plist.append("Process "+desc+" sleep for "+str(num)+" s!")
num -= 1
if __name__ == '__main__':
manager = Manager()
plist = manager.list()
tlist = list()
jobs = 
for i in range(5):
for j in jobs:
for j in jobs:
for data in plist:
the output looks like
Process 1 sleep for 1 s!
Process 3 sleep for 3 s!
Process 2 sleep for 2 s!
Process 4 sleep for 4 s!
Process 2 sleep for 1 s!
Process 3 sleep for 2 s!
Process 4 sleep for 3 s!
Process 3 sleep for 1 s!
Process 4 sleep for 2 s!
Process 4 sleep for 1 s!
if the plist is swapped with tlist the output is empty.
jobs.append(Process(target=ping, args=(i,tlist)))and then
for data in tlist:]
Order of the data in the list depends as they were writen by given process.
Citing from documentation:
A manager returned by Manager() will support types list, dict, Namespace, Lock, RLock, Semaphore, BoundedSemaphore, Condition, Event, Queue, Value and Array. http://docs.python.org/library/multiprocessing.html#sharing-state-between-processes
I use list like this to record milion lines of data output; however, I do not care on the order…