GAP (orb) - Chapter 8: AVL trees

AVL trees are balanced binary trees called AVL trees in honour of their inventors G.M. Adelson-Velskii and E.M. Landis (see [AM62]). A description in English can be found in [Knu97] in Section 6.2.3 about balanced trees.

The general idea is to store data in a binary tree such that all entries in the left subtree of a node are smaller than the entry at the node and all entries in the right subtree are bigger. The tree is kept balanced which means that for each node the depth of the left and right subtrees differs by at most 1. In this way, finding something in the tree, adding a new entry, deleting an entry all have complexity \(log(n)\) where \(n\) is the number of entries in the tree. If one additionally stores the number of entries in the left subtree of each node, then finding the \(k\)-th entry, removing the \(k\)-th entry and inserting an entry in position \(k\) also have complexity \(log(n)\). The orb contains an implementation of such tree objects providing all these operations.

Entries in AVL tree objects are key-value pairs and the sorting is done by the key. If all values as true then no memory is needed to store the values (see the corresponding behaviour for hash tables). The only requirement on the type of the keys is that two arbitrary keys must be comparable in the sense that one can decide which of them is smaller. If GAPs standard comparison operations \(<\) and \(=\) work for your keys, no further action is required, if not, then you must provide your own three-way comparison function (see below).

Note that the AVL trees implemented here can be used in basically two different ways, which can sometimes be mixed: The usual way is by accessing entries by their key, the tree is then automatically kept sorted. The alternative way is by accessing entries by their index in the tree! Since the nodes of the trees remember how many elements are stored in their left subtree, it is in fact possible to access the \(k\)-th entry in the tree or delete it. It is even possible to insert something in position \(k\). However, note that if you do this latter operation, you are yourself responsible to keep the entries in the tree sorted. You can ignore this responsibility, but then you can no longer access the entries in the tree by their key and the corresponding functions might fail or even run into errors.

This usage can be useful, since in this way AVL trees provide an implementation of a list data structure where the operation list access (by index), adding an element (in an arbitrary position) and deleting an element (by its index) all have complexity \(log(n)\) where \(n\) is the number of entries in the list.

8.2 Using AVL trees

8.2-1 AVLTree

‣ AVLTree( [opt] ) ( function )

This function creates a new AVL tree object. The optional argument opt is an options record, in which you can bind the following components:

cmpfunc is a three-way comparison function taking two arguments a and b and returning \(-1\) if \(\textit{a} < \textit{b}\), \(+1\) if \(\textit{a} > \textit{b}\) and \(0\) if \(\textit{a} = \textit{b}\). If no function is given then the generic function AVLCmp (8.2-2) is taken. This three-way comparison function is stored with the tree and is used for all comparisons in tree operations. allocsize is the number of nodes which are allocated for the tree initially. It can be useful to specify this if you know that your tree will eventually contain a lot of entries, since then the tree object does not have to grow that many times.

For every AVL tree a three-way comparison function is needed, usually you can get away with using the following default one:

8.2-2 AVLCmp

‣ AVLCmp( a, b ) ( function )

This function calls the < operation and the = operation to provide a generic three-way comparison function to be used in AVL tree operations. See AVLTree (8.2-1) for a description of the return value. This function is implemented in the kernel and should be particularly fast.

8.2-3 AVLAdd

‣ AVLAdd( t, key, val ) ( function )

The first argument t must be an AVL tree. This function stores the key key with value value in the tree assuming that the keys in it are sorted according to the three-way comparison function stored with the tree. If value is true then no additional memory is needed. It is an error if there is already a key equal to key in the tree, in this case the function returns fail. Otherwise it returns true.

8.2-4 AVLLookup

‣ AVLLookup( t, key ) ( function )

The first argument t must be an AVL tree. This function looks up the key key in the tree and returns the value which is associated to it. If the key is not in the tree, the value fail is returned. This function assumes that the keys in the tree are sorted according to the three-way comparison function stored with the tree.

8.2-5 AVLDelete

‣ AVLDelete( t, key ) ( function )

The first argument t must be an AVL tree. This function looks up the key key in the tree, deletes it and returns the value which was associated with it. If key is not contained in the tree then fail is returned. This function assumes that the keys in the tree are sorted according to the three-way comparison function stored with the tree.

8.2-6 AVLFindIndex

‣ AVLFindIndex( t, key ) ( function )

The first argument t must be an AVL tree. This function looks up the key key in the tree and returns the index, under which it is stored in the tree. This index is one-based, that is, it takes values from 1 to the number of entries in the tree. If key is not contained in the tree then fail is returned. This function assumes that the keys in the tree are sorted according to the three-way comparison function stored with the tree.

8.2-7 AVLIndex

‣ AVLIndex( t, index ) ( function )

The first argument t must be an AVL tree. This function returns the key at index index in the tree, so index must be an integer in the range 1 to the number of elements in the tree. If the value is out of these bounds, fail is returned. Note that to use this function it is not necessary that the keys in the tree are sorted according to the three-way comparison function stored with the tree.

8.2-8 AVLIndexLookup

‣ AVLIndexLookup( t, index ) ( function )

The first argument t must be an AVL tree. This function returns the value associated to the key at index index in the tree, so index must be an integer in the range 1 to the number of elements in the tree. If the value is out of these bounds, fail is returned. Note that to use this function it is not necessary that the keys in the tree are sorted according to the three-way comparison function stored with the tree.

8.2-9 AVLIndexAdd

‣ AVLIndexAdd( t, key, value, index ) ( function )

The first argument t must be an AVL tree. This function inserts the key key at index index in the tree and associates the value value with it. If value is true then no additional memory is needed to store the value. The index index must be an integer in the range 1 to \(n+1\) where \(n\) is the number of entries in the tree. The new key is inserted before the key which currently is stored at index index, so calling with index equal to \(n+1\) puts the new key at the end. If index is not in the corrent range, this function returns fail and the tree remains unchanged.

Caution: With this function it is possible to put a key into the tree at a position such that the keys in the tree are no longer sorted according to the three-way comparison function stored with the tree! If you do this, the functions AVLAdd (8.2-3), AVLLookup (8.2-4), AVLDelete (8.2-5) and AVLFindIndex (8.2-6) will no longer work since they assume that the keys are sorted!

8.2-10 AVLIndexDelete

‣ AVLIndexDelete( t, index ) ( function )

The first argument t must be an AVL tree. This function deletes the key at index index in the tree and returns the value which was associated with it.

8.2-11 AVLFind

‣ AVLFind( t, key ) ( function )

The first argument t must be an AVL tree. This function locates the key key in the tree and returns the position in the positional object, at which the node which contains the key is stored. This position will always be divisible by 4. Use the functions AVLData (8.2-13) and AVLValue (8.2-14) to access the key and value of the node respectively. The function returns fail if the key is not found in the tree. This function assumes that the keys in the tree are sorted according to the three-way comparison function stored with the tree.

8.2-12 AVLIndexFind

‣ AVLIndexFind( t, index ) ( function )

The first argument t must be an AVL tree. This function locates the index index in the tree and returns the position in the positional object, at which the node which hash this index is stored. This position will always be divisible by 4. Use the functions AVLData (8.2-13) and AVLValue (8.2-14) to access the key and value of the node respectively. The function returns fail if the key is not found in the tree. This function does not assume that the keys in the tree are sorted according to the three-way comparison function stored with the tree.

8.2-13 AVLData

‣ AVLData( t, pos ) ( function )

The first argument t must be an AVL tree and the second a position in the positional object corresponding to a node as returned by AVLFind (8.2-11). The function returns the key associated with this node.

8.2-14 AVLValue

‣ AVLValue( t, pos ) ( function )

The first argument t must be an AVL tree and the second a position in the positional object corresponding to a node as returned by AVLFind (8.2-11). The function returns the value associated with this node.

The following convenience methods for standard list methods are implemented for AVL tree objects:

8.2-15 Display

‣ Display( t ) ( method )

This function displays the tree in a user-friendly way. Do not try this with trees containing many nodes!

8.2-16 ELM_LIST

‣ ELM_LIST( t, index ) ( method )

This method allows for easy access to the key at index index in the tree using the square bracket notation t[index]. It does exactly the same as AVLIndex (8.2-7). This is to make AVL trees behave more like lists.

8.2-17 Position

‣ Position( t, key ) ( method )

This method allows to use the Position operation to locate the index at which the key key is stored in the tree. It does exactly the same as AVLFindIndex (8.2-6). This is to make AVL trees behave more like lists.

8.2-18 Add

‣ Add( t, key[, index] ) ( method )

This method allows to use the Add operation to add a key (with associated value true) to the tree at index index. It does exactly the same as AVLIndexAdd (8.2-9), so the same warning about sortedness as there applies! If index is omitted, the key is added at the end. This is to make AVL trees behave more like lists.

8.2-19 Remove

‣ Remove( t, index ) ( method )

This method allows to use the Remove operation to remove a key from the tree at index index. If index is omitted, the last key in the tree is remove. This method returns the deleted key or fail if the tree was empty. This is to make AVL trees behave more like lists.

8.2-20 Length

‣ Length( t ) ( method )

This method returns the number of entries stored in the tree t. This is to make AVL trees behave more like lists.

8.2-21 \in

‣ \in( key, t ) ( method )

This method tests whether or not the key key is stored in the AVL tree t. This is to make AVL trees behave more like lists.

8.3 The internal data structures

An AVL tree is a positional object in which the first 7 positions are used for administrative data (see table below) and then from position 8 on follow the nodes of the tree. Each node uses 4 positions such that all nodes begin at positions divisible by 4. The system allocates the positional object larger than actually needed such that not every new node leads to the object being copied. Nodes which become free are collected in a free list. The following table contains the information what is stored in each of the first 7 entries:

1	last actually used position, is always congruent 3 mod 4
2	position of first node in free list
3	number of currently used nodes in the tree
4	position of largest allocated position is always congruent 3 mod 4
5	three-way comparison function
6	position of the top node
7	a plain list holding the values stored under the keys

The four positions used for a node contain the following information, recall that each node starts at a position divisible by 4:

0 mod 4	reference to the key
1 mod 4	position of left node or 0 if empty, balance factor (see below)
2 mod 4	position of right node or 0 if empty
3 mod 4	index: number of nodes in left subtree plus one

Since all positions of nodes are divisible by 4, we can use the least significant two bits of the left node reference for the so called balance factor. Balance factor 0 (both bits 0) indicates that the depth of the left subtree is equal to the depth of the right subtree. Balance factor 1 (bits 01) indicates that the depth of the right subtree is one greater than the depth of the left subtree. Balance factor 2 (or -1 in [Knu97], here bits 10) indicates that the depth of the left subtree is one greater than the depth of the right subtree.

For freed nodes the position of the next free node in the free list is held in the 0 mod 4 position and 0 means the end of the free list.

Position 7 in the positional object can contain the value fail, in this case all stored values are true. This is a measure to limit the memory usage in the case that the only relevant information in the tree is the key and no values are stored there. This is in particular interesting if the tree structure is just used as a list implementation.

Note that all functions dealing with AVL trees are both implemented on the GAP level and on the kernel level. Both implementations do exactly the same thing, the kernel version is only much faster and tuned for efficiency whereas the GAP version documents the functionality better and is used as a fallback if the C-part of the orb is not compiled.

8 AVL trees

8.1 The idea of AVL trees