\( \newcommand{\Order}[1]{\mathcal{O}(#1)} \newcommand{\TEXT}{\mathcal{T}} \)

In this post, I want to present a new feature of the succinct data structure library, which I really like. The problem was the following. When you read a theory paper on succinct data structures you can get easily lost in all those space complexity terms. Often it is hard to imagine how big e.g. \(\log\frac{1}{H_k}\) would be or which constant is hidden in a \(\Order{\cdot}\)-term. It is even worse, if the data structure is composed of many other succinct substructures.

In sdsl you can easily determine for each object how many space is consumed. Say you have an object X then the methods util::get_size_in_bytes or util::get_size_in_mega_bytes will answer you the question. However, you do not know the breakdown to the different components of the data structure. We have addressed that problem and now it is possible to call

write_structure<JSON_FORMAT>(X, std::cout);

to get a hierarchical decomposition of the space usage of a object. The type of the output can be specified by the template parameter. Right now we support output in the JSON format (JSON_FORMAT) and as R lists (R_FORMAT).

Here is an example of the output of an object of a compressed suffix array which is based on a run-length encoded wavelet tree which uses SD arrays. The output contains the information we want to know, however it is hard to read. Therefore lets press to button Show serialize and the JSON output will be transformed in a nice SVG graphic which shows the information in a more accessible way.

The visualization was realized with Mike Bostock's superb D3.js JavaScript library (D3 = Data-Driven Documents). The graphic is a mirrored icicle-graph. You can read the example the following way: The object of class csa_wt has size 158.62 MiB. It has one member named wavelet_tree of type wt_rlmn and this member takes almost all space; namely 158.61 MiB (click on the entry to get this information). The wavelet_tree member itself is composed of three members called bl,bf, and wt. Now you can inspect each of these members by clicking on it. Lets choose bf. The figure will show at the upper part the size and fraction of the selected member. In this example bf takes 54.07 MiB which corresponds to 34.09% of the whole space. Experts will know that the sdsl class sd_vector (64bit implementation of the SD array can be further decomposed into the low and high bit part of the positions of the set one bits and select data structure for the high bit_vector. Explore it if you like ;) To go up the hierarchy again simply click on a element in a higher level and the figure will zoom out again.

And at last the best thing of this blog entry. You can use the form above to paste in your own output and visualize the it!



blog comments powered by Disqus

Published

27 August 2012

Tags