Plotting large data sets with python (without Matplotlib)

Python and Matplotlib is very good combination for processing data and their presentation. But this time, trying to do some “ray tracing”, I encountered some problem to plot large, or rather huge amounts of data. Task was to plot several thousands to millions of lines in a plot with opacity close to 0.005 and see how they blend together.

Comparing Linux computer with 756 MB ram and Mac computer with 2 GB ram it was obvious that the more ram the better, but even that amount was not enough. I tried several backends (agg, pdf, ps, cairo, macosx) and file formats (pdf, eps, png, svg) without any improvement. It still would take several hours to finish the run. In the case of svg file format it even took time to render the image as it is XML based format. For example Inkscape or Gimp could generate the output image in about 2~4 minutes. But there was The Idea.

Why not to store results as svg file format and then let some other dedicated software to display them?

Modifying program was easy and the results were calculated supper fast. I mean several hours comparing to 10 min is some difference. Using both cores of processor also helped to reduce the time to about a half. Output data were stored in list that was written to file. The format is as follows.

Firstly we need to write header of SVG file, that can look similar to this:

saveFile.write("<?xml version=\"1.0\" standalone=\"no\"?>\n
<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n\"\">
<svg width=\"1200pt\" height=\"800pt\" viewBox=\"0 0 1200 800\"\n

This I copied from some other SVG and modified some values as “width”, “height” and “viewBox”, so it fits to my needs. Then each line we want to plot was written line this:

saveFile.write("\n<g id=\""+str(process)+str(Pocet)+"\">\n
<path style=\"fill: none; stroke: #ffffff; stroke-width: 1.000000;
stroke-linejoin: round; stroke-linecap: square; 
opacity:"+ str(intensityFactor*ri)+" \"
d=\"M"+str((c+60)*10)+" "+str((d+40)*10)+" "
+str((e+60)*10)+" "+str((f+40)*10)+"\"/>\n</g>")

Here the “path”, “stroke-width”, “opacity” and “d” were of interest. “Path” tag (<path>

<path d="M a b c d" style="fill:none;stroke:#fff;stroke-width:1.0"/>

defines path. “M” causes to move to position given by “a” and “b” coordinates and then draw line to “c” and “d” position. More about tag can be found at w3 school page.

So like this we can plot all the data and then let them display by some other program with SVG capabilities. The output of the SVG file is human readable and can be compressed if necessary (e.g. to transfer). The output file size in my case is about 180 MB, what contains about 1000000 lines plotted in there with different opacity. Opacity in this case is other dimension and represents intensity of ray beam. Overall combination of the rays can reveal some preferred direction or something else, and help to our imagination.

Other method I used was to create own canvas as matrix and just simply add the intensity to the given cells. This was idea of my colleague Jano. Due to lack of anti aliasing the output was not so appealing and this approach had it cost in increased calculation time. Then the SVG output file came and that was it.

This post was just to present some idea that can be used in many different ways and cases with hope somebody can find this helpful.

Leave a Reply

Your email address will not be published. Required fields are marked *