Building a Large Knowledge Base from a Structured Source: The CIA World Fact Book

Reference: Frank, G.; Farquhar, A.; & Fikes, R. Building a Large Knowledge Base from a Structured Source: The CIA World Fact Book. Knowledge Systems Laboratory, April, 1998.

Abstract: The on-line world is populated by an increasing number of knowledge-rich resources. Furthermore, there is a growing trend among authors to provide semantic markup of these resources. This presents a tantalizing prospect. Perhaps we can leverage the person-years of effort invested in building these knowledge-rich resources to create large-scale knowledge bases.

The World Fact Book knowledge base has been an experiment in the construction of a large-scale knowledge base from a source authored using semantic markup. The content of the knowledge base is, in large part, derived from the CIA World Fact Book, and covers a broad range of information about the world's nations. The World Fact Book is a highly structured document with a complex underlying ontology. The structure makes it possible to parse the document in order to carry out the knowledge extraction. However, irregularities of the text written by humans and the complexity of the domain make the knowledge extraction process non-trivial.

We describe the process we used to construct the World Fact Book knowledge base, including parsing the source, refining the implicit knowledge, constructing a substantial supporting ontology, and reusing existing ontologies. We also discuss some of the key representational issues addressed and show how the resulting axioms can be used to answer a variety of queries.

We hope that the broad accessibility of the resulting knowledge base and its neutral representational format will enable others to work with and extend the content, as well as explore issues of structuring and inferencing in large-scale knowledge bases.

Full paper available as ps.

