On-Line Learning of Predictive Compositional Hierarchies Karl Pfleger Department of Computer Science Stanford University Abstract: Language, music, spatial configurations, event chronologies, action sequences, and many other types of data exhibit hierarchical compositional structure, in which high-level entities represent combinations of lower-level entities. Compositional (or part-whole) relationships, like taxonomic (is-a) relationships, also serve as critical components of representation for artificial intelligence. Existing work with hand-built compositional hierarchies demonstrates the abilities of these structures to make predictive inferences that smoothly integrate bottom-up and top-down influences, spanning multiple levels of spatial or temporal resolution. However, unlike taxonomies, for which numerous basic learning algorithms exist, there has not been analogous foundational work on learning predictive compositional hierarchies. My research demonstrates that predictive compositional hierarchies can also be learned purely from primitive data in a way that is general, unsupervised, and on-line (incrementally seeing a little data at a time), with the ability at any point to make predictions about unseen data. I define a new framework, which is basic and quite general, for on-line learning of unsegmented sequence data. Systems capable of performing the task well will have many possible uses resulting from their ability to fill in missing data, resolve ambiguities, detect anomalies, etc. I introduce two examples of new learning systems for this task, both employing compositional hierarchies but based on different classes of traditional learning models. The first, based on symmetric recurrent neural networks with probabilistic semantics, provides a novel on-line structure modification rule for such networks. The second, based on n-grams, introduces several contributions, including an on-line method for selecting and storing only high-frequency patterns, a method for weighting the statistical reliability of empirical frequency estimates with different ages, and two unique methods for utilizing lower-order models when combining n-grams. The essence of on-line compositional hierarchy learning is the bottom-up identification of frequently occurring repeated patterns in data, which enables the future discovery of even larger patterns. Both systems introduced are capable of composing larger and larger patterns (or chunks) as they see more data, with no prespecified bound but nonetheless using less storage space than that taken by the data itself. This hierarchical process has the potential to scale automatically from fine-grained, low-level data to coarser, high-level representations tuned to the statistical characteristics of the environment, thereby bridging a gap that has proved to be one of the largest stumbling blocks on the way to creating significantly more complex and intelligent autonomous agents.