This document was created to help developers understand what Symtree does and how it works in order to make them more comfortable using it.
In this demonstration, we will be adding words to a dictionary. The default behavior of Symtree is to act like a triangular dictionary where all words are lower case. Symtree has compile options to allow it to store other kinds of data. This demonstration contains terminology specific to dictionaries.
First, assign an st_instance_ptr a symbol tree using create_symtree(). Never assign st_instance_ptr NULL because it won't work properly with functions in the library. create_symtree() will create an empty symbol tree (in this case, dictionary) with no symbols (words) in it.
You will notice that there are 26 pointers. Each pointer represents a letter. In this case, since it is the root node, each pointer represents to first letter of a word. If we mark this node as end-of-symbol, the null word would be part of the dictionary. Athough it may not make sense for a dictionary to have a null word, it may be useful in some other applications. More on marking nodes as end-of-symbol later.
We are going to add the following words to our dictionary: 'artist', 'art' and 'arch'. There are two ways to add words to the dicionary with the API provided.
You can add a whole word at once with add_sym(st_instance_ptr, char*). The string must only contain acceptable characters (in this case letters, upper case letters are automatically converted to lower case) and it must be null terminated. Let's add 'artist' to our dictionary with this method.
You also add words one character at a time using stream_add_sym(st_accessor_ptr, char). If you need to add words from a stream, this is what should be used. It allows you to not have to worry about buffering input. We must keep track of the where we are in adding words to the tree. We do this with an accessor, (much like a file accessor). You can create an accessor with create_st_accessor(st_instance_ptr).
Using this method, a word will automatically be added when a non-alphabetic character is encountered. You must be carefull to check that two non-alphabetic (ie, spaces) do not occur in a row if you do not wish to have the null word added to the dictionary.
Lets add the other two words to our dictionary by feeding stream_add_sym(st_accessor_ptr, char) the stream "art arch".
Woops. You will notice that 'arch' was not add as a word. That is because there was no space following 'arch' in the stream. The proper way to make sure the last word in a stream is added, is to use mark_sym(st_accessor_ptr).
mark_sym(st_accessor_ptr) will add the last word. It will also set the accessor to start a new word. mark_sym(st_accessor_ptr) will never add a null word, so it is always safe to use when in doubt (if we are not sure if the last character is a space). So let's add arch by using mark_sym(st_accessor_ptr).
Now lets try searching for words. There are two ways of doing this.
We can search for a whole word at once by feeding sym_search(st_instance_ptr, char*) a string. The string must containt (in this case) only alphabetic letters (and null terminated) or it will return __NON_ALPHA_ERROR. Returns 1 if word found or 0 if not found.
We can also check the state of the current word in a stream with stream_sym_state(st_accessor_ptr, char). Feeding it one character at time, it will return one of the following values:
We've covered the basics of how Symtree works and how to use it. Now let's clean up. Free up any memory use by accessors using free(). Free up any memory used by any symtree instances using destroy_symtree(st_instance_ptr).
peace,
Greg Valcourt
While we were hunting rabbits, I came upon a clear.