Web Query Languages, Intelligent Information Integration

Alan K. Dippel

CSC-671
November 30,1998
Professor: Fred Sadri

Department of Mathematical Sciences
University of North Carolina at Greensboro
Greensboro, NC

http://www.cs.indiana.edu/~adippel/csc671/web_query_lang.htm

Abstract

The following is a review of two different documents dealing with information integration on the World-Wide Web. The Web is made up of numerous sources of information that we as the user would like to find out about. The layman in the computer world does not have the expertise to know how to access this data. It is up to the database integrator to bring this information together for the user. One paper deals with the more abstract problems facing the web searcher and data integrator and the other presents a practical language used to query XML data, a real life application on the web.


Overview

The two papers were written with different purposes in mind. The authors of "XML-QL: A Query Language for XML" [DFFLS98], presented to the World-Wide Web, XML-QL, a hands on working standard for querying XML data on web sites. This is in contrast to the purpose of the paper "Database Techniques for the World-Wide Web: A Survey"  [BPS98], written to provide insight into what is being done in the area of web data integration and querying.

The following sections give a detailed summary of what obstacles the database integrator faces in their effort to present World-Wide Web information to the end user. The authors of Database Techniques [BPS98] tried not to cover too broad of a subject area in their short survey of the database techniques on the web.  The article did include a good deal of information, but did not give enough examples to give a good flavor of what is being done in the database area on the World-Wide Web. Some areas of work were only mentioned in a short paragraph. A reference to a related article would point you in the right direction for further research.


Information Management of Data on the World-Wide Web

D. Florescu, A. Levy, and A. O. Mendelzon presented three areas of discussion for Querying on the World-Wide Web.

  1. The article starts out by discussing the structure of the WWW and discusses ways of modeling the web and querying of the web using this model.

  2. Next the discussion moves to how the data can be extracted from the web and integrated together for presentation to the end user. The ultimate goal being to present information to the user that meets their needs.

  3. Finally a discussion of how you would construct new sites and restructure existing sites in order to facilitate querying of WWW information is given

How to represent the data structure of the Web for Web/DB?

The search engines of today do not take the structure of the Internet into account, but instead throw vast amounts of data at you. The user must sort through the data to find what they want. The first aspect to querying is that web pages and links must be modeled. Different data models have been used to model the data found on the web. The majority of the data on the web is semistructured and therefore has no fixed schema. In addition these pages, data sources, etc. change from day to day making modeling of the data more difficult.  The following models have been used to represent Web/DB structures.

  • Graph data models:
    A labeled graph data model  is a natural way to represent the data pages and links between them. Nodes in the graph represent the pages or components and the links are represented by  the arcs between the nodes.

    Example of a graph data model - http://www.mypages/

    GraphModel.gif (5400 bytes)
  • Semistructured data models:
    The structure of the data is irregular in that there is no fixed Schema for the data. The structures can be modeled by labeled directed graphs. The graphs model the structure by a technique of using arc variables which get bonded to labels on arcs, rather than on the nodes in the labeled graph. There is also no restrictions on the set of arcs that emerge from a node in the structure.

    Semistructured data can be described by the following structure characteristics.

    • Schema is not given in advance, but is implicit in the data.
    • Schema is relatively large and changes frequently.
    • Schema is illustrative of the data rather than prescriptive (i.e. it describes the current structure, but allows for violations of the schema.
    • Data is not strongly typed which means that attributes with the same name may change type as they are used in different places.

    XML data falls in this area of semistructured data.

  • Web data models in general:
    Web data models are able to model the structure of hypertext documents as we know them. These models are able to better represent the web structure.
    • some models distinguish pages as unary relations, and links as binary relations
    • models distinguish a link within a page(node) or a website from an external link outside the web site. This is important because links(arcs) can generally be traversed only in the forward direction.
    • Models have varying capabilities of modeling order among elements, modeling nested data structures in databases, and support for collection types(sets, bags, arrays).
  • ADM - data model of the ARANEUS project.
    The ARANEUS project is an example of a web site database model that is explicitly structured for web/DB querying.
    • Ulixes and Penelope - The Ulixes language is used to build relational views over the web. An ADM model is used to represent the Web. This model supports Web pages and page schemes, nested data structures, and collection types. The Penelope language is used to create hypertext views of the data.

Example of a web site using ARANEUS

ADM Scheme for the Louvre Web Server
http://www.dia.uniroma3.it/Araneus/adm/louvreadm.html [LOUVRE98]
This is a portion of Louvre Scheme, an ADM scheme for the Louvre Web server.
louvrescheme.gif (10362 bytes)

legenda.gif (4120 bytes)

What about interactive sites?

All of the above models, model structures of the web that do not change interactively. A great deal of the web is becoming more and more interactive and a page is presented to the user based on how they respond to prompts. This is a vast area of the web that was not covered in the survey - Database Techniques for the World-Wide Web: The authors did acknowledge this short coming and pointed out that this is an area that needs to be researched.


Querying of the data structures modeled.

As stated above the content only search of current search engines on the Internet ignore the structure of pages in their search. Once we have created a model that takes this structure into account, we must develop a method of querying this model. Query languages must be developed that a person familiar with query languages can use.

Structural searches look at the structure of the web site for patterns that match what is being asked for. This allows the search to return a list of pages or data that is more closely related to your search string. Instead of returning all 8,000 pages with a certain string the search can return a list that points to a group of pages that give a broader coverage of the search string subject. A prototype next-generation web search engine named Google was mentioned in the paper.

Theory of web queries.

Original web query theory was based on the fact that the only possible way to access the web is to navigate links from known starting points. The query "list all web documents that no other document points to" is not a query that can be solved.

Related Query paradigms

The authors touch on other related query paradigms that have developed, but not specifically for querying the Web. These query languages are similar to the web query languages.

  • Hypertext/ Document query languages:

These languages were developed in the pre web era. One example uses a method of mapping documents to object oriented database instances. Then the structure of the database can be queried using the language of the database.

  • Graph query languages:

Software engineering, computer networks led to graph-based languages such as G, G+, and GraphLog which is based on Datalog. These languages are modeled using labeled graphs.

  • Semistructured query languages.

UnQL, StruQL, Lorel - also use labeled graphs, but emphasize the ability to query the schema of the data structure. These languages were not developed for the web and do not distinguish between graph edges that connect within a document and a hyperlink to another document.


First Generation - Web Query Languages.

  • WebSQL - models web site as a relational database of two relations Document and Anchor. Document has tuples for each document on the web. Anchor relation has one tuple for each anchor in each document. These tuples are virtual and can not be enumerated. It uses known URL's in the FROM clause to search.  The language uses the symbols -> for a link to the same site, #> for a link in the same document,  and => for a link to another site.
    • Example of a query of triples using the WebSQL language. Finds all external links to pages from all pages reachable from "www.mysite.start"
      ******************************
      SELECT d.url, e.url, a.label
      FROM Document d SUCH THAT
         "www.mysite.start" ->* d,
         Document e SUCH THAT d => e,
         Anchor a SUCH THAT a.base = d.url
      WHERE a.href = e.url

      Could add condition to FROM, d MENTIONS "search string" to search for a certain string within the pages.
    • Another example that finds the url and title of all documents that mention the text "web queries" starting at http://www.cs.indiana.edu/ and reachable in a link path of length three on the same site.
      *******************************
      SELECT d2.url, d2.title
      FROM Document d1 = "http://www.cs.indiana.edu",
         Document d2 SUCH THAT d1 =|->|->->|->->-> d2
         and d2 MENTIONS "web queries",
         Anchor a SUCH THAT a.base = d1.url
      WHERE a.href = d2.url

    The MENTIONS clause in the language is implemented by making a query of a search engine such as AltaVista or HotBot. This set of tuples, urls and titles, is the set of documents on the web that contain the phrase searched for. An example of a tutorial of WebSQL is found at: http://techst02.technion.ac.il/~c1200963/seminar/seminar.html

  • W3QL - Similar to WebSQL but, uses external programs for specifying content conditions on files instead of including these in the language. Next generation will replace these external methods with extensible methods based on the MIME standard.
  • WebLog - Uses a deductive rules language, DataLog, instead of SQL-like syntax.
  • WQL - query language of the WebDB project is similar to WebSQL, but supports more comprehensive SQL functionality such as aggregation and grouping. Also has limited intra-document structure querying.

Second generation: Web Data Manipulation Languages.

The second generation languages are much more powerful and model the internal structure of web pages in addition to the links between the pages. These languages can create new structures based on the queries of the web pages.

  • Florid -   Prototype implementation of the deductive and object-oriented formalism F-Logic.

    A web document is modeled by 2 classes, url and webdoc, as strings.

    (Example of a Florid model)

    The url class has only a get method.
    The webdoc class has methods self, author, modif, type, hrefs, and error

    • url::string[get => webdoc]
      webdoc::string[url => url; author => string;
      modif => string:
      type => string; hrefs@(string) =>> url;
      error =>> string]

      ("www.cs.toronto.edu":url).get.
      (Y:url).get <-
      (X:url).get.[hrefs@(L)=>>{Y}],
      substr("database",L)

    The above Florid program extracts the set of all documents reachable from http://www.cs.toronto.edu/.

  • WebOQL - uses hypertree data structure - ordered arc-labeled trees with two types of arcs, internal and external. Internal arcs represent structured objects and external arcs are used to represent references(typically hyperlinks) Sets of hypertrees are collected into webs.
  • StruQL - of the Strudel web site management system. Based on labeled directed graphs. Supports URL's, Postscript, text, image, and HTML files.
  • ARANEUS -  [ARANEUS98] A database project that uses the Ulixes language to build relational views of the data and then generates hypertexual views for the user using the Penelope language.

 


System

Data Model Language Style Path Expression Graph Expression
WebSQL relational SQL Yes No
W3QL labeled Multigraphs SQL Yes No
WebLog relational DataLog No No
Lorel labeled graphs OQL Yes No
WebOQL hypertrees OQL Yes Yes
UnQL labeled graphs structural recursion Yes Yes
STRUDEL labeled graphs DataLog Yes Yes
ARANEUS page schemas SQL Yes Yes
FLORID F-Logic DataLog Yes No

Table 1. Comparison of query systems.
[BPS98] Data Techniques for the World-Wide Web: a survey

 

Summary of web query languages.

All of the above languages are too complex to be used directly by interactive users.  Work in the area of interactive query interfaces suitable for the casual user is being done. This is the area of data integration that has the most potential of making information on the web available to the public.


Information Integration.

The web can be thought of as containers of sets of tuples, embedded in HTML, or hidden behind forms interfaces. The method of accessing all of this data is to create a wrapper to give the illusion that the web site is serving sets of tuples. This association is a web source. These sources can then be combined to answer queries that use data from various web sources. It is not vary simple task to develop these wrappers to integrate the data.

Problems to deal with in web integration.

  • Large and evolving number of web sources.
  • Very little meta-data about the characteristics of the source web data.
  • Larger degree of source autonomy

Two approaches to dealing with these problems  of the vast amount of web data are proposed.

  1. warehousing - data from multiple web sources is loaded into a warehouse and queries are run on the warehouse. This simplifies the access and also speeds  up the query, but does not address the vast changes that are always happening to the web.
  2. virtual - data is left on the web and at query time is searched from the data sources. This approach is appropriate for systems that have a large number of sources, data is changing frequently and there is little control over the web source. This approach may take more time, but gives the user a fresh up to date answer to their query.

The Virtual data integration approach was the one focused on by the authors..

Two major differences from traditional database system are pointed out.

  1. The query system does not communicate directly with a storage manager, but uses wrappers.
  2. user does not pose queries directly in the schema in which the data is stored. This gives you data independence. Instead virtual relations are queried and a source description is used to reformulate the query for the source schema. This translation is transparent to the user.

Specifications of mediated schema and reformulation:

The end user is given a intermediate schema to query that is designed to make it easier for the user. This schema is the set of collection and attribute names used in the queries. The mediated schema is translated to the data source schema via the Mediator program.

Two Mediator approaches are used to present the schema -   Global as View, Local as View.

  1. GAV, Global as View, - for each relation R in the mediated schema, we write a query over the source relations specifying how to obtain R's tuples.
  2. LAV - for every information source S, we write a query over the relations in the mediated schema that describes which tuples are found in S.

Obstacles in creating query structures on the web.

  • Web Sites are not complete - In general sources of data are not complete for their domain that they cover.
  • Query processing capabilities - Sources have varying query processing capabilities. Data my actually be stored in a structured file or legacy systems and be limited as to access. Even if it is in a database site may provide only limited access.

    Example: it may not be allowed to ask for the entire movie database, but only for an individual movie.
  • Query optimization:
  • Query execution engines:
  • Data in the form of an HTML page - XML may lead web site builders to export the data underlying their sites in a machine readable form, thereby making it easier to design wrappers.
  • Attributes with same name contain different data - It is difficult to match objects across various sources of information

Overall view of a data integration system

datasystem.gif (11469 bytes)

The above diagram was from Database Techniques for the World-Wide Web [FLM98]


Web site construction and restructuring.

The final section of the article discusses building a web site to support data integration.

Creation of web sites is normally broken into the following tasks.

  • Choosing and accessing the data that will be displayed at the site.
  • Designing the sites structure.
  • Designing the graphical presentation of pages.

The task of updating a site, restructuring a site, or enforcing integrity constraints on a site's structure, are tedious to perform. The rewards of designing the site correctly are the justification of doing so. If the web site is declared declaritively as a query and not procedurally by a program it is easy to change the query to create multiple views of the information for different classes of users.

How is the web site presented to the user?

Normally a web site is created a page at a time and the person building the page must keep track of all of the pages and how they are related and work to present a seamless graphical presentation to the user.

The figure below shows how the web site would be designed using query structures. Wrappers translate the data from the different sources on the site and a mediator program presents these various sources of data to the declarative web site structure. The declarative structure is defined as views over the data presented by the wrapper interfaces. Since these views are defined by a query of the data it is possible to present different logical views of the web site to different classes of users. Finally a consistent graphical presentation specification ensures that the theme of the site looks and feels the same from page to page.

Example ( Users on the Internet only see what is defined for external viewers, but users within the organization see an Intranet view of the web site which presents much more information to the them. Different levels of security are also possible using a login/password verification when a person accesses the web site.)

 

websitemangmnt.gif (5038 bytes)

Architecture for Web Site Management Systems

Summary of web construction and restructuring

A big advantage of using a query view of the structure of the site is the ability to easily redefine a site just by changing the site definition. Normally a great deal of time is required to recreate new HTML pages to present the site and it is difficult to integrate the changes smoothly. As the underlying data of a site changes the queries can be changed to facilitate the upgrade of the site.


XML-QL a query language for XML.

The document on XML-QL authored by Alin Deutsch, Mary Fernandez, Daniela Florescu, Alon Levy, Dan Suciu is a practical discussion of a possible standard to use for the web querying community. The standard suggested is based on the new XML standard of data exchange on the WEB. The language was suggested by the authors to fill the void of extracting   and integrating data from XML sources into a web client. This language could also be used to send a query to the XML host requesting certain data.

The language follows the SELECT-WHERE structure of SQL. It uses a semistructured data query language. The XML language that it uses is a simple structure and is very flexible. It only requires that the tags match and be properly nested.

example:

<person><name> Alan </name><phone> 775-5555 </phone><address> P.O. Box 56, Kernersville, NC </address</person>

The actual schema of the XML data is contained in the data itself. This allows the XML structure to be varied as the web site changes. The creator of the web site is not limited to what complex structures they use in the web pages.

Questions about XML data.

How will data be extracted?, How will data be exchanged, via the raw data or just sending the query to the host? How will data be translated between different user domains? How do you integrate data from multiple XML sources? These are some of the questions that remain to be solved. XML-QL is a new language and undoubtedly will change and mature as more the Internet web users start to use the XML standard for structuring their web site.

Modeling XML-QL

A variation of the semistructured data model is used as a model in the proposed XML-QL query language. Research in the area of semistructured data was used to design the language.

Definition. An XML Graph consists of:

  • A graph, G, in which each vertex is represented by a unique string called an object identifier (OID),
  • G's edges are labeled with element tag identifiers,
  • G's nodes are labeled with sets of attribute-value pairs,
  • G's leaves are labeled with values (strings), and
  • G has a distinguished node called the root. 

root

ex1-new.gif (3039 bytes)

As defined above book and title are examples of the element tag identifiers, (year="1995") is an example of a node attribute/Value pair, and examples of leaves are An Introduction..., Addison-Wesley. The root is the very top, beginning node , of the graph.

Examples from XML-QL: A Query Language for XML

ex1.gif (860 bytes)

<title><CDATA> A Trip to </CDATA><titlepart><CDATA> the Moon </CDATA></titlepart></title>

 

The authors also mention XSL which is a standard intended for specifying style and layout of the XML documents. This is not the same idea as the XML-QL language. XML-QL is capable of much more data-intensive operations and transformations of data.

Already the XML language has facilitated exchange of data over the Web. The language facilitates this by not limiting the tags in a document. The user is able to create all the tags that they wish. The tags themselves define the schema of the data in the pages. There are Industry initiatives and current applications that are growing rapidly(see [Cover98]). The XML-QL language is presented by the authors through the use of example queries and a language syntax.The following is an example from the paper

First a simple query that extracts data from an XML document is presented. The DTD, Document Type Descriptor is as follows:

<!ELEMENT book (author+, title, publisher)>
<!ATTLIST book year CDATA>
<!ELEMENT article (author+, title, year?, (shortversion|longversion))>
<!ATTLIST article type CDATA>
<!ELEMENT publisher (name, address)>
<!ELEMENT author (firstname?, lastname)>

The query statement is made up of a WHERE clause and a CONSTRUCT clause:

WHERE <book>
<publisher><name>Addison-Wesley</></>
<title> $t</>
<author> $a</>
</> IN "www.a.b.c/bib.xml"
CONSTRUCT <result>
<author> $a</>
<title> $t</>
</>

This query is then applied to a small data structure:

<bib>
<book year="1995">
<!-- A good introductory text -->
<title> An Introduction to Database Systems </title>
<author> <lastname> Date </lastname> </author>
<publisher> <name> Addison-Wesley </name > </publisher>
</book>

<book year="1998">
<title> Foundation for Object/Relational Databases: The Third Manifesto </title>
<author> <lastname> Date </lastname> </author>
<author> <lastname> Darwen </lastname> </author>
<publisher> <name> Addison-Wesley </name > </publisher>
</book>
</bib>

and the following result block is produced:

<result>
<author> <lastname> Date </lastname> </author>
<title> An Introduction to Database Systems </title>
</result>

<result>
<author> <lastname> Date </lastname> </author>
<title> Foundation for Object/Relational Databases: The Third Manifesto </title>
</result>

<result>
<author> <lastname> Darwen </lastname> </author>
<title> Foundation for Object/Relational Databases: The Third Manifesto </title>
</result>

This is just a simple example, but as complex of a structure as you can imagine can be produced using the tags that you define.

 


Conclusions

The authors of "Database Structures for the World-Wide Web: A Survey" presented a good selection of ideas and examples of database query languages for the web. The purpose of the paper was to get more of the web community interested in developing and applying methods of querying the Internet. The paper was not geared toward a person writing applications on the internet, but more toward the theoretical group. I think to spur more excitement in the subject of web querying a more hands on approach to the paper could have been taken. Querying of the Internet is a tool that can be very useful. I am often discouraged by the information that I get from todays search engines and look forward to the day when we will be able to ask the web a question and get back an authoratative list of information.

The authors of "XML-QL: A Query Language for XML" did a very good job of presenting a working representation of a standard. The purpose of the paper was to present the symantics and syntax of the language and I think through the examples this was acomplished. The idea of using XML as a database standard along with XML-QL is good. The use of a tag based langauge is well know and can easily represent very complex structures through the use of tags. The next step after a language is defined is to determine how it is to be implemented? this is the part that will take a lot of work and cooperation within the Internet community. The internet is a vast source of data and finding ways to make it more accessable will benefit us all.


Appendix: Grammar for XML-QL

Grammar for XML-QL from
"XML-QL: A query language for XML"

 

Note The grammar is still being developed, as the language evolves, and is incomplete in the current version of this document. Terminal symbols are shown in angular brackets and their lexical structure is not further specified.
 

XML-QL Grammar
XML-QL ::= (Function | Query) <EOF>
Function ::= 'FUNCTION' <FUN-ID> '(' (<VAR>(':' <DTD>)?)* ')' (':' <DTD>)?
     Query
'END'
Query ::= Element | Literal |  <VAR> | QueryBlock 
Element ::=  StartTag  Query  EndTag
StartTag ::= '<'(<ID>|<VAR>) SkolemID? Attribute* '>' 
SkolemID ::= <ID> '(' <VAR> (',' <VAR>)* ')'
Attribute ::= <ID> '='  ('"' <STRING> '"'  |  <VAR> )
EndTag ::= '<' / <ID>? '>'
Literal ::= <STRING>
QueryBlock ::=  Where  Construct ('{' QueryBlock '}')*
Where ::= 'WHERE' Condition (',' Condition)*
Construct ::= OrderedBy? 'CONSTRUCT'  Query
Condition ::= Pattern BindingAs*  'IN' DataSource Predicate
Pattern ::= StartTagPattern Pattern* EndTag
StartTagPattern ::= '<' RegularExpression Attribute* '>'
RegularExpression ::= RegularExpression '*' |
RegularExpression '+' |
RegularExpression '.' RegularExpression |
RegularExpression '|' RegularExpression |
<VAR> |
<ID>
BindingAs ::= 'ELEMENT_AS' <VAR>  | 'CONTENT_AS' <VAR>
Predicate ::= Expression OpRel Expression
Expression ::=  <VAR> | <CONSTANT>
OpRel ::= '<' | '<=' | '>' | '>=' | '=' | '!='
OrderedBy ::= 'ORDERED-BY' <VAR>+
DataSource ::= <VAR> | <URI> | <FUN-ID>(DataSource (',' DataSource)*)

 

Condition
Start ::= ( QueryBlock )?
QueryBlock ::= ( Query | ( '<' QueryBlock '>' ) )+
Query ::= 'Select' Select 'Where' Where

A query block consists of one or more queries and zero or more subblocks. Each subblock applies to the query it is following. Otherwise, the order of the queries is irrelevant. The queries are executed unconditionally; the subblocks are executed only if and when, in addition to the query's conditions, their conditions hold as well.

Select Clause
Select ::= ('unique')? Contents
Contents ::= ( | Element | Literal | QueryBlock )+
Element ::= '<' ( | ) ( SkolemID )? ( AttributeList )? ('>' Contents EndTag | '/>' )
SkolemID ::= '=' SkolemFn

A select-clause constructs a piece of the query's result. It consists of one or more of a variable, or some element, or some literal, or some other query block. Elements may have associated semantic oid's, also called Skolem Functions

Where Clause
Where ::= Condition ( ',' Condition )*
Condition ::= TagPattern 'in' DataSource | Predicate
TagPattern ::= '<' PathExpr RestTag ('ELEMENT_AS' | 'CONTENT_AS' )?
RestTag ::= '/>' | ('>' (NestedPattern)? EndTag )
NestedPattern ::= ( TagPattern )+ | | Literal
PathExpr ::= ConjPathExpr ( '|' ConjPathExpr )*
ConjPathExpr ::= KleenePathExpr ( '.' KleenePathExpr )*
KleenePathExpr ::= ( BasicPathExpr ( '+' | '?' | '*' )? ) | '*'

A Where-clause consists of a series of conditions. Each condition binds some variable(s) with a tag pattern, or imposes more restrictions on previously bound variables in a predicate.

Rest
BasicPathExpr ::= ( AttributeList )? | ( AttributeList )? | '$' ( AttributeList )? | BasicPathExpr | '(' PathExpr ')'
AttributeList ::= ( Attribute )+
Attribute ::= ( | ) '=' AttrVal
AttrVal ::= Literal |
SkolemFn ::= '(' ( SkolemArgs )? ')'
SkolemArgs ::= ( ',' )*
EndTag ::= ' )? '>'
DataSource ::= |
Predicate ::= ConjPredicate ( ConjPredicate )*
ConjPredicate ::= BasicPredicate ( BasicPredicate )*
BasicPredicate ::= Expr RelOp Expr | BasicPredicate | Set | '(' Predicate ')'
Set ::= '{' ( Expr (',' Expr )* )? '}'
Expr ::= Term ( ( '+' | '-' ) Term )*
Term ::= Fact ( ( '*' | '/' ) Fact )*
Fact ::= Literal | | Fact | '(' Expr ')'
Literal ::= | | |
RelOp ::= '<' | '="<'" | '>' | '>=' | '=' | '~='

Grammar from "XML-QL: A Query Language for XML"

Glossary

  • ADM - Data model for ARANEUS Project. ADM can model the structure of a web site.
  • Anchor - Schema of the WebSQL query language that contains links to pages.
  • ARANEUS Project - aims at developing tools for the management of data coming from the World Wide Web.
  • Araneus - Latin word for spider
  • DTD - Document Type Descriptor that gives a general schema for the XML document.
  • FLORID - (F-LOgic Reasoning In Databases) a deductive object-oriented database prototype employing F-Logic as data definition and query language.
  • F-Logic - Frame Logic - A deductive object-oriented logic used for querying. See Publications for more references.
  • HTTP - Hypertext Markup Language. Developed as an easy way to define pages on the World-Wide Web. Language is similar in design to the design of SGML.
  • Lorel - Language designed for querying semistructured data.
  • Mediator - Program that translates the web source schema into a common schema to be queried by the user.
  • SGML - Standard Generalized Markup Language, a system for organizing and tagging elements of a document.
  • STRUDEL - A web site management language designed to use StruQL.
  • UnQL - A language for querying semistructured data that can be modeled as labeld graphs. Allows the user to query the structure and the data of the semistructured data.
  • W3QL - An SQL like query language that is similar to WebSQL. Used in the W3QS project. The language queries structure and content of the web.
  • WebSQL - SQL like language that is used to query the web hyperlinks starting at a certain URL.
  • WebLog - A declarative language used to query the World-Wide Web.
  • WebOQL - System developed at the University of Toronto for extracting data from semistructured sources.
  • Wrappers - programs that translate the schema of a data source to a mediated schema that is presented to the query system.
  • XML - [BPS98], Extensible Markup Language, is a system for defining, validating, and sharing document formats. XML is a subset of the SGML, Standard Generalized Markup Language.  The XML language was created to make it possible for the users to define there own <tags> within the documents.
  • XML-QL[DFFLS98], is a query language for XML written by several experts in the database field.
  • XSL - a standard intended for specifying style and layout of XML documents.

 

 

 

Bibliography

[DFFLS98] Alin Deutsch, Mary Fernandez, Daniela Florescu, Alon Levy, and Dan Suciu, XML-QL: A Query Language for XML, Submission to the World Wide Web Consortium, 19-August-1998

[FLM98] D. Florescu, A. Levy, and A. O. Mendelzon, Database Techniques for the World Wide Web: A Survey. Sigmond Record, Vol. 27, No. 3, 1998, Pages 59-74

[BPS98] Tim Bray, J ean Paoli  , C. M. Sperberg-McQueen, Extensible Markup Language (XML) 1.0l, W3C Recommendation, 10-February-1998

[COVER98] Robin Cover, The SGML/XML Web Page, Extensible Markup Language (XML), November 11, 1998

[FLORID98] The FLORID Project, http://www.informatik.uni-freiburg.de/~dbis/florid/

[LOUVRE98] The Louvre Palace and Museum, http://www.lourve.fr/

[ARANEUS98Database Group of Università di Roma Tre and Database Group of Università della Basilicata, The ARANEUS Project, ongoing web site.

Bibliography in Bibtex

 

http://www.cs.indiana.edu/~adippel/csc671/web_query_lang.htm