SDSL: Succinct Data Structure Library
A C++ template library for succinct data structures
 All Classes Namespaces Files Functions Variables Typedefs Friends
Public Types | Public Member Functions | Public Attributes
sdsl::wt_huff< BitVector, RankSupport, SelectSupport, SelectSupportZero, dfs_shape > Class Template Reference

A Wavelet Tree class for byte sequences. More...

#include <wt_huff.hpp>

List of all members.

Public Types

typedef int_vector::size_type size_type
typedef unsigned char value_type
typedef BitVector bit_vector_type
typedef RankSupport rank_1_type
typedef SelectSupport select_1_type
typedef SelectSupportZero select_0_type

Public Member Functions

template<typename RandomAccessContainer >
 wt_huff (const RandomAccessContainer &rac, size_type size)
 Constructor.
template<uint8_t w>
 wt_huff (const int_vector< w > &rac)
template<typename RandomAccessContainer >
void construct (const RandomAccessContainer &rac, size_type size)
template<class size_type_class >
 wt_huff (int_vector_file_buffer< 8, size_type_class > &rac, size_type size)
template<class size_type_class >
void construct (int_vector_file_buffer< 8, size_type_class > &rac, size_type size)
 Construct the wavelet tree from a random access container.
 wt_huff (const wt_huff &wt)
 Copy constructor.
wt_huffoperator= (const wt_huff &wt)
 Assignment operator.
void swap (wt_huff &wt)
 Swap operator.
size_type size () const
 Returns the size of the original vector.
bool empty () const
 Returns whether the wavelet tree contains no data.
value_type operator[] (size_type i) const
 Recovers the ith symbol of the original vector.
size_type rank (size_type i, value_type c) const
 Calculates how many symbols c are in the prefix [0..i-1] of the supported vector.
size_type rank_ith_symbol (size_type i, value_type &c) const
 Calculates how many occurrences of symbol wt[i] are in the prefix [0..i-1] of the original sequence.
size_type select (size_type i, value_type c) const
 Calculates the ith occurrence of the symbol c in the supported vector.
void interval_symbols (size_type i, size_type j, size_type &k, std::vector< unsigned char > &cs, std::vector< size_type > &rank_c_i, std::vector< size_type > &rank_c_j) const
 Calculates for each symbol c in wt[i..j-1], how many times c occurs in wt[0..i-1] and wt[0..j-1].
size_type serialize (std::ostream &out, structure_tree_node *v=NULL, std::string name="") const
 Serializes the data structure into the given ostream.
void load (std::istream &in)
 Loads the data structure from the given istream.

Public Attributes

const size_type & sigma
const bit_vector_type & tree

Detailed Description

template<class BitVector = bit_vector, class RankSupport = typename BitVector::rank_1_type, class SelectSupport = typename BitVector::select_1_type, class SelectSupportZero = typename BitVector::select_0_type, bool dfs_shape = 0>
class sdsl::wt_huff< BitVector, RankSupport, SelectSupport, SelectSupportZero, dfs_shape >

A Wavelet Tree class for byte sequences.

A wavelet tree is build for a vector of characters over the alphabet $\Sigma$. This class should be used only for small alphabets $\Sigma \ll n$ (see int_wavelet_tree for a wavelet tree for big alphabets). The wavelet tree $wt$ consists of a tree of bitvectors and provides three efficient methods:

The idea of using a Huffman shaped wavelet was first mentioned on page 17 of the following technical report: Veli Mäkinen and Gonzalo Navarro: Succinct Suffix Arrays based on Run-Length Encoding. Available under: http://swp.dcc.uchile.cl/TR/2005/TR_DCC-2005-004.pdf

  \par Space complexity

$\Order{n H_0 + 2|\Sigma|\log n}$ bits, where $n$ is the size of the vector the wavelet tree was build for.


Constructor & Destructor Documentation

template<class BitVector = bit_vector, class RankSupport = typename BitVector::rank_1_type, class SelectSupport = typename BitVector::select_1_type, class SelectSupportZero = typename BitVector::select_0_type, bool dfs_shape = 0>
template<typename RandomAccessContainer >
sdsl::wt_huff< BitVector, RankSupport, SelectSupport, SelectSupportZero, dfs_shape >::wt_huff ( const RandomAccessContainer &  rac,
size_type  size 
) [inline]

Constructor.

   \param rac Reference to the vector (or unsigned char array) for which the wavelet tree should be build.
   \param size Size of the prefix of the vector (or unsigned char array) for which the wavelet tree should be build.
   \par Time complexity

$ \Order{n\log|\Sigma|}$, where $n=size$


Member Function Documentation

template<class BitVector = bit_vector, class RankSupport = typename BitVector::rank_1_type, class SelectSupport = typename BitVector::select_1_type, class SelectSupportZero = typename BitVector::select_0_type, bool dfs_shape = 0>
template<class size_type_class >
void sdsl::wt_huff< BitVector, RankSupport, SelectSupport, SelectSupportZero, dfs_shape >::construct ( int_vector_file_buffer< 8, size_type_class > &  rac,
size_type  size 
) [inline]

Construct the wavelet tree from a random access container.

Parameters:
racA random access container
sizeThe length of the prefix of the random access container, for which the wavelet tree should be build
template<class BitVector = bit_vector, class RankSupport = typename BitVector::rank_1_type, class SelectSupport = typename BitVector::select_1_type, class SelectSupportZero = typename BitVector::select_0_type, bool dfs_shape = 0>
void sdsl::wt_huff< BitVector, RankSupport, SelectSupport, SelectSupportZero, dfs_shape >::interval_symbols ( size_type  i,
size_type  j,
size_type &  k,
std::vector< unsigned char > &  cs,
std::vector< size_type > &  rank_c_i,
std::vector< size_type > &  rank_c_j 
) const [inline]

Calculates for each symbol c in wt[i..j-1], how many times c occurs in wt[0..i-1] and wt[0..j-1].

   \param i The start index (inclusive) of the interval.
   \param j The end index (exclusive) of the interval.
   \param k Reference that will contain the number of different symbols in wt[i..j-1].
Parameters:
csReference to a vector of size k that will contain all symbols that occur in wt[i..j-1] in arbitrary order.
rank_c_iReference to a vector which equals rank_c_i[p] = rank(i,cs[p]), for $ 0 \leq p < k $
rank_c_jReference to a vector which equals rank_c_j[p] = rank(j,cs[p]), for $ 0 \leq p < k $
Time complexity
$ \Order{\min{\sigma, k \log \sigma}} $
Precondition
$ i\leq j $ $ cs.size() \geq \sigma $ $ rank_{c_i}.size() \geq \sigma $ $ rank_{c_j}.size() \geq \sigma $
template<class BitVector = bit_vector, class RankSupport = typename BitVector::rank_1_type, class SelectSupport = typename BitVector::select_1_type, class SelectSupportZero = typename BitVector::select_0_type, bool dfs_shape = 0>
value_type sdsl::wt_huff< BitVector, RankSupport, SelectSupport, SelectSupportZero, dfs_shape >::operator[] ( size_type  i) const [inline]

Recovers the ith symbol of the original vector.

Parameters:
iThe index of the symbol in the original vector. $i \in [0..size()-1]$
Returns:
The ith symbol of the original vector.
Time complexity
$ \Order{H_0} $ on average, where $ H_0 $ is the zero order entropy of the sequence.
template<class BitVector = bit_vector, class RankSupport = typename BitVector::rank_1_type, class SelectSupport = typename BitVector::select_1_type, class SelectSupportZero = typename BitVector::select_0_type, bool dfs_shape = 0>
size_type sdsl::wt_huff< BitVector, RankSupport, SelectSupport, SelectSupportZero, dfs_shape >::rank ( size_type  i,
value_type  c 
) const [inline]

Calculates how many symbols c are in the prefix [0..i-1] of the supported vector.

Parameters:
iThe exclusive index of the prefix range [0..i-1], so $i\in[0..size()]$.
cThe symbol to count the occurrences in the prefix.
Returns:
The number of occurrences of symbol c in the prefix [0..i-1] of the supported vector.
Time complexity
$ \Order{H_0} $
template<class BitVector = bit_vector, class RankSupport = typename BitVector::rank_1_type, class SelectSupport = typename BitVector::select_1_type, class SelectSupportZero = typename BitVector::select_0_type, bool dfs_shape = 0>
size_type sdsl::wt_huff< BitVector, RankSupport, SelectSupport, SelectSupportZero, dfs_shape >::rank_ith_symbol ( size_type  i,
value_type &  c 
) const [inline]

Calculates how many occurrences of symbol wt[i] are in the prefix [0..i-1] of the original sequence.

   \param i The index of the symbol.
Parameters:
cReference that will contain symbol wt[i].
Returns:
The number of occurrences of symbol wt[i] in the prefix [0..i-1]
Time complexity
$ \Order{H_0} $
template<class BitVector = bit_vector, class RankSupport = typename BitVector::rank_1_type, class SelectSupport = typename BitVector::select_1_type, class SelectSupportZero = typename BitVector::select_0_type, bool dfs_shape = 0>
size_type sdsl::wt_huff< BitVector, RankSupport, SelectSupport, SelectSupportZero, dfs_shape >::select ( size_type  i,
value_type  c 
) const [inline]

Calculates the ith occurrence of the symbol c in the supported vector.

Parameters:
iThe ith occurrence. $i\in [1..rank(size(),c)]$.
cThe symbol c.
Time complexity
$ \Order{H_0} $

The documentation for this class was generated from the following file: