Notes
Outline
Knowledge on the Web:
XML
A Machine Understandable Web
The Web was originally built for human consumption
Web pages are machine-readable but not machine-understandable
Example: A bibliography entry in HTML
<UL>
<LI>
R. Goldman, J. McHugh, and J. Widom.
<A href="ftp://db.stanford.edu/pub/papers/xml.ps">
From Semistructured Data to XML: Migrating the Lore Data Model and Query Language
</A>.
Proceedings of the 2nd International Workshop on the Web and Databases (WebDB ‘99), Philadelphia, Pennsylvania, June 1999.
</UL>
Extensible Markup Language (XML)
“XML is like HTML, where you make up your own tags.”
Provides a uniform method for describing and exchanging data using the HTTP protocol
HTML enables a universal method for displaying data
Tag a word to be displayed in bold or italic
XML provides a universal method for describing data
Declare data to be a retail price, a sales tax, a book title, ...
Data is made up of characters or unparsed “entities”
Subset of Standard Generalized Markup Language (SGML)
Defined by the World Wide Web Consortium (W3C)
XML Document Syntax
XML document
Optional prolog
One or more elements
Optional miscellany
Comments
Processing instructions
E.g.,
<?xml version="1.0"?>
<greeting>Hello, world!</greeting>
<!-- A simple XML document. -->
XML Element Syntax
Non-empty element
Start tag
Element name
Optional attribute specifications
Attribute name
Quoted string
Content
Elements
Character data
Entity or character references
Character data sections
Processing instructions
Comments
End tag
Bibliographic Entry in XML
<Publication URL="ftp://db.stanford.edu/pub/papers/xml.ps”
             Authors="RG JM JW">
   <Title>From Semistructured Data ... Language</Title>
   <Published>Proceedings of the ... Databases</Published>
   <Location>
      <City>Philadelphia</City>
      <State>Pennsylvania</State>
   </Location>
   <Date>
      <Month>June</Month>
      <Year>1999</Year>
   </Date>
</Publication>
<Author ID="RG">R. Goldman</Author>
<Author ID="JM">J. McHugh</Author>
<Author ID="JW">J. Widom</Author>
Example Weather Report in XML
<weather-report>
     <date>March 25, 1998</date>
     <time>08:00</time>
     <area>
        <city>Seattle</city>
        <state>WA</state>
        <region>West Coast</region>
        <country>USA</country>
     </area>
     <measurements>
        <skies>partly cloudy</skies>
        <temperature>46</temperature>
        ...
     </measurements>
 </weather-report>
Language Identification Attribute
<p xml:lang="en">The quick brown fox jumps over the lazy dog.</p>
<p xml:lang="en-GB">What colour is it?</p>
<p xml:lang="en-US">What color is it?</p>
<sp who="Faust" desc='leise' xml:lang="de">
     <l>Habe nun, ach! Philosophie,</l>
     <l>Juristerei, und Medizin</l>
     <l>und leider auch Theologie</l>
     <l>durchaus studiert mit heißem Bemüh'n.</l>
</sp>
Document Type Definition (DTD)
Describes the syntax of a class of XML documents
Which elements are present
Structural relationships between the elements
Contains or points to markup declarations
Element type declarations
Attribute-list declarations
Entity declarations
Notation declarations
Examples
<!DOCTYPE greeting SYSTEM "http://www. ...">
<!DOCTYPE greeting
          [<!ELEMENT greeting (#PCDATA)> ]>
Element Type Declaration
<!ELEMENT Name Content-Spec>
Content spec
EMPTY
ANY
Character data, optionally interspersed with child elements
Child elements
Using a simple grammar governing the allowed types of the child elements and the order in which they may appear
Examples
<!ELEMENT MEMO (TO,FROM,SUBJECT,BODY,SIGN)>
<!ELEMENT BODY (P+)>
Attribute List Declaration
<!ATTLIST Name Attribute-Definition* >
Attribute definition
Name Attribute-Type Default-Declaration
Attribute type
String type
A set of tokenized types
Enumerated types
Default declaration
#REQUIRED
#IMPLIED (no default value provided)
Attribute value (character data)
Examples
<!ATTLIST MEMO importance (HIGH|MEDIUM|LOW) "LOW">
<!ATTLIST SIGN signatureFile CDATA #IMPLIED
               email CDATA #REQUIRED>
Entity Declaration
<!ENTITY Name Entity-Definition >
Entity definition
Entity value
External ID
SYSTEM URL
PUBLIC Public-identifier URL
(External-ID NDATA Name)
Example DTD: Memo
<!ELEMENT MEMO    (TO,FROM,SUBJECT,BODY,SIGN)>
<!ATTLIST MEMO    importance (HIGH|MEDIUM|LOW) "LOW">
<!ELEMENT TO      (#PCDATA)>
<!ELEMENT FROM    (#PCDATA)>
<!ELEMENT SUBJECT (#PCDATA)>
<!ELEMENT BODY    (P+)>
<!ELEMENT P       (#PCDATA)>
<!ELEMENT SIGN    (#PCDATA)>
<!ATTLIST SIGN    signatureFile CDATA #IMPLIED
                  email CDATA #REQUIRED>
Example Memo
<!DOCTYPE MEMO SYSTEM "http://www. ...">
<MEMO importance HIGH>
  <TO>Jones</TO>
  <FROM>S.Smith</FROM>
  <SUBJECT>Project Plan</SUBJECT>
  <BODY>
    <P> … </P>
    <P> … </P>
  </BODY>
  <SIGN email SSMITH.CS.STANFORD.EDU>
    S. Smith
  </SIGN>
</MEMO>
Example DTD: Novel
<!ELEMENT novel
   (preface,chapter+,biography?)>
<!ELEMENT preface (paragraph+)>
<!ELEMENT chapter (title,paragraph+,section+)>
<!ELEMENT section (title,paragraph+)>
<!ELEMENT biography (title,paragraph+)>
<!ELEMENT paragraph (#PCDATA|keyword)*>
<!ELEMENT title (#PCDATA|keyword)*>
<!ELEMENT keyword (#PCDATA)>
XML Namespaces
URI (Universal Resource Identifier)
The Web is an information space
The URIs are the points in that space
URI: name or address that refers to a resource
URL (Uniform Resource Locator): URI that includes explicit instructions on how to access the resource on the internet
XML namespace
Set of names used as element types and attribute names
Identified by a URI
Universally unique
Qualified name
A universally unique identifier
Syntax:  NamespaceName ‘:’ LocalPart
Declaring Namespaces
A namespace is declared as an attribute specification of attribute xmlns or an attribute whose prefix is xmlns:
Example:  <x xmlns:edi = 'http://ecommerce.org/schema'>
          <!-- the "edi" prefix is bound to
             http://ecommerce.org/schema for the "x"
             element and contents -->
       </x>
A declaration applies to the element where it is specified and to all elements within the content of that element, unless overridden by another namespace declaration with the same attribute name
Example:  <!-- both namespace prefixes are available
            throughout -->
     <bk:book xmlns:bk = 'urn:loc.gov:books'
              xmlns:isbn = 'urn:ISBN:0-395-36341-6'>
        <bk:title> Cheaper by the Dozen </bk:title>
        <isbn:number> 1568491379 </isbn:number>
     </bk:book>
Default Namespace
Example of multiple namespace prefixes in an element
<bk:book xmlns:bk = 'urn:loc.gov:books'
         xmlns:isbn = 'urn:ISBN:0-395-36341-6'>
   <bk:title> Cheaper by the Dozen </bk:title>
   <isbn:number> 1568491379 </isbn:number>
</bk:book>
Example of default namespace in an element
<book xmlns = 'urn:loc.gov:books'
      xmlns:isbn = 'urn:ISBN:0-395-36341-6'>
   <title> Cheaper by the Dozen </title>
   <isbn:number> 1568491379 </isbn:number>
</book>
Extensible Markup Language (XML)
“XML is like HTML, where you make up your own tags.”
Provides a uniform method for describing and exchanging data using the HTTP protocol
HTML enables a universal method for displaying data
Tag a word to be displayed in bold or italic
XML provides a universal method for describing data
Declare data to be a retail price, a sales tax, a book title, ...
Data is made up of characters or unparsed “entities”
Provides a “syntactic schema”
Provides no means of specifying semantics