Homework 3 Lessons Learned

DAML Ontology Generation

The first ontology generated for this homework assignment is the UNSPSC Product Ontology (http://ksl.stanford.edu/projects/DAML/UNSPSC.daml ; Size: 1.8 MB) which was obtained from www.unspsc.org . This is a slight modification of a previous version. If people desire to reuse this ontology, we suggest obtaining an updated version from unspsc first. The original UNSPSC ontology format was imported into the Ontolingua Knowledge Base Server (www.ontolingua.stanford.edu) and exported as DAML content.

The second ontology generated for this homework assignment is the CIA World Fact Book (http://ontolingua.stanford.edu/doc/chimaera/ontologies/cia-world-fact-book.daml ; Size: 3.4 MB) which was scraped from the web-based version of this content (http://www.odci.gov/cia/publications/factbook/) and loaded into Ontolingua. The scraping was done over 2 years ago. The content was processed in accordance with our needs in the DARPA HPKB program (www.darpa.mil). Prior to using this ontology, we suggest that an update of the factbook content be obtained. Ontolingua was then used to export the DAML content.

The third ontology for this assignment (http://ksl.stanford.edu/projects/DAML/chimaera-jtp-cardinality-test1.daml ; Size: 25 KB) defines the object structure for a diagnostic interface between JTP (a Java-based theorem prover) and Chimaera (an on-line KB diagnostic tool www.ksl.stanford.edu/software/chimaera ). It is used to test the inferential power of the reasoner. The initial knowledge base just tests the cardinality section of the inferential work required for DAML.

DAML Importing/Exporting Issues

Although we did not have any direct problems exporting the above knowledge bases as DAML+OIL, there are a few general DAML+OIL language issues that were encountered during the development of our DAML importing and exporting tools. These issues are listed below:

Related KSL Work for Ontology Generation

We have also had experience with the generation of "instance oriented" content for related projects by scraping HTML pages for web services content. We thought it would be of value to the DAML community to share some of our experience with this process.

A common approach to information extraction from HTML pages is to use syntactic rules/pattern matching to retrieve structural/semi-structural information. Two software packages available for such syntactic content scraping are W4 (http://www.tropea-inc.com/technology/W4F/) and Compaq's Web Language, formerly called WebL (http://research.compaq.com/SRC/WebL). The following is a brief description of the pros and cons of each language based on an evaluation in the fall. Note that the software products we describe are both rapidly evolving and some of our comparisons may already be out of date.

Compaq's Web Language (formerly called WebL):

Some Notable Features:




Some Notable Features:




home | people | software and network services | projects | contact | technical reports | links

Copyright ©2005 Stanford University
All Rights Reserved.

Last modified: Sunday, 03-Jul-2005 06:07:02 PDT