At a glance

Presage is:

multiple language models and predictive algorithms available, uses context to generate relevant predictions
implemented in C++
learns while predicting, can be trained on users' generated text for additional accuracy
supports any natural language, its prediction engine can be trained on any text corpora
builds on Linux, Windows, MacOS X, Solaris, Maemo, etc.
free software
licensed under GPL
core architecture is designed to ease addition and integration of novel predictive algorithms
XML configuration profiles determine the runtime behaviour and predictive functionality
native bindings for C++, C and Python; support for other language via D-Bus service

Presage predictors

Predictor Description Status
generalized smoothed n-gram statistical predictor This predictor relies on a language model (an n-gram database generated from a text corpus using the text2ngram tool) to compute the most probable prediction based on the current context and language model. For example, for 3-gram language model, the probability of each possible prediction token is computed according to this formula:
P(w3 | w1, w2) = a * f(w3 | w1, w2) + b * f(w3 | w2) + c * f(w3)
where a, b, and c are configurable smoothing parameters and:
f(w3 | w1, w2) = C(w1, w2, w3) / C(w1, w2)
f(w3 | w2) = C(w2, w3) / C(w2)
f(w3) = C(w3) / C
C(w0 ... wn) = count of n-gram <w0, ... ,wn>
C = sum of all token counts in database
ARPA predictor The ARPA predictor enables the use of statistical language modelling data in the ARPA N-gram format. In the ARPA format each N-gram is stored with its discounted log probability and its Katz backoff weight. Probabilities are estimated by applying Katz backoff smoothing to the maximum likelihood estimates based on n-gram counts data. DONE
recency predictor Based on recency promotion principle, this predictor generates predictions by assigning exponentially decaying probability values to previously encountered tokens. Tokens are assigned a probability value that decays exponentially with their distance from the current token, thereby promoting context recency. DONE
dictionary predictor Generates a prediction by returning tokens that are a completion of the current prefix in alphabetical order. The predictive accuracy of this predictor is very low. This predictive predictor is meant as an example to getting started with developing predictors for presage. DONE
abbreviation expansion predictor Maps the current prefix to a token and returns the token in a prediction with a 1.0 probability. The abbreviations are configurable and extensible. DONE
dejavu predictor Dejavu predictor is able to learn and then later reproduce previously seen text sequences. The goal is dejavu predictor is to write a simple predictor that demonstrates how learning can be implemented in the presage system and exposes what functionality is required for learning predictors to work within the presage framework. DONE

Presage bindings

Language Notes Status
C++ Native binding. Simply #include header file. DONE
C Simply #include header file. DONE
Python Import presage python module with import presage, then create an instance of presage with prsg = presage.Presage(config). You are now ready to generate predictions with soothie.predict(string). The entire presage API defined in presage.h has been mapped and is available from Python. DONE
.NET C# Define suitable callback methods, import presage.net assembly with using presage, then create an instance of presage with Presage prsg = new Presage (demo.callback_get_past_stream, demo.callback_get_future_stream, "presage_csharp_demo.xml");. You are now ready to generate predictions with prsg.predict(). The entire presage API defined in presage.h is available in .NET. DONE

Presage core modules

Module Description Status
ContextChangeDetector Detects context changes in streams. DONE
ContextTracker Keeps track of user's input. Responds to preditors' queries regarding the context. DONE
Tokenizer Breaks up the context into tokens. DONE
Selector Determines which suggestions should be returned as a prediction. Decisions are based on configurable parameters. DONE
PredictorRegistry Manages instantiation and iteration through predictors; aids generating predictions and learning. DONE
PredictorActivator Coordinates individual predictors. DONE, more TODO
Combiner Merges the suggestions returned by different predictors into a single prediction. DONE, more combiners to come
ProfileManager Manages configuration profiles. DONE
Configuration Allows to query and modify all presage runtime configuration variables and notify listeners of configuration variable changes. DONE
Profile Represents a configuration profile at runtime. DONE
PresageCallback Provides an interface that applications using presage must implement to hook into the application textual context. DONE
Presage Provides the entry point for user application to presage predictive text functionality. DONE

Get Presage at SourceForge.net. Fast, secure and Free Open Source software downloads