Visualizing data structures in sdsl
In this post, I want to present a new feature of the succinct data structure library, which I really like. The problem was the following. When you read a theory paper on succinct data structures you can get easily lost in all those space complexity terms. Often it is hard to imagine how big e.g. \(\log\frac{1}{H_k}\) would be or which constant is hidden in a \(\Order{\cdot}\)-term. It is even worse, if the data structure is composed of many other succinct substructures.
In sdsl you can easily determine for each
object how many space is consumed. Say you have an object X
then the methods
util::get_size_in_bytes
or util::get_size_in_mega_bytes
will answer
you the question. However, you do not know the breakdown to the different
components of the data structure. We have addressed that problem and
now it is possible to call
write_structure<JSON_FORMAT>(X, std::cout);
to get a hierarchical decomposition of the space usage of a object.
The type of the output can be specified by the template parameter.
Right now we support output in the JSON format
(JSON_FORMAT
)
and as R lists (R_FORMAT
).
Here is an example of the output of an object of a compressed
suffix array which is based on a
run-length encoded wavelet tree
which uses
SD arrays.
The output contains the information we
want to know, however it is hard to read. Therefore lets press
to button Show serialize
and the JSON output will be transformed
in a nice SVG graphic which shows
the information in a more accessible way.
The visualization was realized with Mike Bostock's superb D3.js
JavaScript library (D3 = Data-Driven Documents).
The graphic is a mirrored
icicle-graph.
You can read the example the following way: The object of class
csa_wt
has size 158.62 MiB. It has one member named
wavelet_tree
of type wt_rlmn
and this member takes almost
all space; namely 158.61 MiB (click on the entry to get this information).
The wavelet_tree
member itself is composed of three members
called bl
,bf
, and wt
. Now you can inspect each of these members
by clicking on it. Lets choose bf
. The figure will show at
the upper part the size and fraction of the selected
member. In this example bf
takes 54.07 MiB which corresponds to
34.09% of the whole space. Experts will know that the
sdsl class sd_vector
(64bit implementation of the
SD array can be further
decomposed into the low and high bit part of the positions
of the set one bits and select data structure for the high
bit_vector
. Explore it if you like ;)
To go up the hierarchy again simply click on a element in
a higher level and the figure will zoom out again.
And at last the best thing of this blog entry. You can use the form above to paste in your own output and visualize the it!
blog comments powered by Disqus