Reference: Golding, A. R. Pronouncing Names by a Combination of Rule-Based and Case-Based Reasoning. Knowledge Systems Laboratory, October, 1991.
Abstract: A novel architecture is presented for improving the accuracy of a rule-based system through case-based reasoning. The central idea is to use the rules to generate an approximate answer to the target problem, and to use cases to handle exceptions to the rules. This provides a way of enhancing an imperfect rule set with relatively little knowledge-engineering effort--obtaining cases is often much easier than the alternative of extending and tuning the rules. The architecture has been applied to the task of pronouncing surnames, and has been found to achieve an accuracy in the ballpark of the best commercial name-pronunciation systems. The architecture is structured as a core method and a set of support modules. The core method is the part that actually solves problems. It incorporates two key ideas: prediction-based indexing, a way of indexing cases to make them accessible for improving the rules; and the compellingness predicate, which combines the results of rule-based and case-based reasoning. As for the support modules, their role is to convert the knowledge inputs of the architecture into a form that can be used directly by the core method. There are three support modules: rational reconstruction, theory extension, and threshold setting. Rational reconstruction infers the solution path for each case in the case library, given just the problem and final answer for the case. Theory extension suggests additions to the architecture's rule set to cover noticeable gaps in the rules. Threshold setting uses a learning procedure to choose values for a set of thresholds that are used by the core method. Instantiating the architecture for name pronunciation results in Anapron, a hybrid RBE/CBR system for pronouncing names. Anapron required two principal extensions to the architecture: similarity-based indexing, an auxiliary indexing scheme to help the system cope with the large case library involved (5000 cases); and positive analogies are somewhat harder to find for rare names than for common ones. A variety of experimental results were collected for Anapron. One was to demonstrate the analogical decline; this says that good analogies are somewhat harder to find for race names than for common ones. A second result, mentioned above, was that Anapron was found to perform in the ballpark of the best commercial name-pronunciation systems. However, of more interest than the absolute performance of the system is a third result, which was that this performance was better than what the system could have achieved with its rules alone. This illustrates the capacity of the architecture to improve on the rule-based system that it starts with.