Search :

home
latest updates
explore themes
user guide
contact us
Literature
Projects
Software
Standards
Promotion Projects
Standards Committees
Organisations
Researchers
Teams
Centers Of Excellence

Explore > Themes > Metadata Schemas > Literature (show summary list)

The Role of Document Type Definitions in Electronic Data



Link to this resource

This paper looks at the role document type definitions (DTDs) will play in the management of XML-based business-to-business electronic data interchange. It is designed to clarify a number of issues that seem to be causing concern at the beginning of 1999 during discussions relating to the best way to set up and manage shareable repositories of business message types. In particular it looks at: the XML Process Model, the role of XML namespaces, the role of repositories in the process of creating DTDs, the role of repositories in the creation of applications and the role of repositories in the management of DTDs.

Researchers that authored this literature


How to Promote Organic Plurality on the WWW



Not an academic article, but a developer's article.

This article addresses the risks of data- and workflow- kidnap in the development of schema standards, focussing on an issue that is relevant to BizTalk developers.

Written by a member of the XML Schema WG, this article highlights a problem that the use of W3C standards can impose on the free interoperability of data.

Researchers that authored this literature


Extracting Schema from Semistructured Data



Link to this resource

Semistructured data is characterized by the lack of any fixed and rigid schema, although typically the data has some implicit structure. While the lack of fixed schema makesextracting semistmctureddata fairly easy and an attractive goal, presenting and querying such data is greatly impaired. Thus, a critical problem is the discovery of the structure implicit in semistructured data and, subsequently, the recasting of the raw data in terms of this structure. In this paper, we consider a very general form of semistructured data based on labeled, directed graphs. We show that such data can be typed using the greatest fixpoint semantics of monadic datalog programs. We present an algorithm for approximate typing of semistructured data. We establish that the general problem of finding an optimal such typing is NP-hard, but present some heuristics and techniques based on clustering that allow efficient and near-optimal treatment of the problem. We also present some preliminary experimental results.

This paper from the database community describes an attempt to derive schema from document instances (such as WWW personal home pages). This novel approach tries to derive compact rather than correct schema, ie an imperfect but useful type specification rather than a type specification which is as complex as the original set of instances.

New Approaches to Cataloguing, Querying and Browsing Geospatial Metadata



Link to this resource

The main objective of our GeoLens project is to fully leverage geospatial metadata to aid discovery and retrieval of geospatial data. Our ultimate goal is to make NASA's remote sensing imagery and other geospatial data available to the broader public on the Internet. At the core of our solution is the GeoLens Catalog Server, designed to support federation of distributed catalogs and to provide seamless interoperability between different metadata standards. Another important component is the GeoLens Metadata Browser, a client-side application designed specifically to exploit the unique features of our Catalog Server. A prototype implementation of the GeoLens framework is available. We consider this work to be a significant step towards building a true federation of heterogeneous catalogs.

This work comes from the database community and is significant in its requirement for metadata schema integration. Unfortunately this paper is 3 years old, the project seems to have folded (the web site is now no longer available). The principle author is now a director of Information Architects, whose Metaphoria product uses metadata annotations to deal with legacy data and legacy data schemas (see <a href='http://www.ia.com/content/devforum/techinfo/docs/technical-whitepaper.htm'>http://www.ia.com/content/devforum/techinfo/docs/technical-whitepaper.htm</a> for more details).

Semantic Metadata for the Integration of Web-based Data for Electronic Commerce



Link to this resource

Today, the Internet can be seen as a global marketplace populated by a huge number of providers and consumers that exchange data from a wide range of domains. A combination of data from different sources for further automatic processing is often hindered by differences in the underlying modeling assumptions and representation. In addition, the available sources are in most cases semistructured, i.e., provide no fixed and explicitly specified schema. Therefore, an integrated use of Web-based data requires explicit information about its organization and meaning. In this paper we present a representation model well-suited for explicit description of implicitly described semistructured data, and show how this model can be used for the integration of heterogeneous data sources from the Web.

This paper describes the application of metadata schema (backed by ontologies) to semi-structured and unstructured Web data.

Meta-Data Jones and the Tower of Babel - The Challenge of Large-Scale Semantic Heterogeneity



Link to this resource

The popularity and growth of the "Information SuperHighway" (e.g., the Web) have dramatically increased the number of information sources available for use and the opportunity for important new information-intensive applications (e.g., massive data warehouses, integrated supply chain management, global risk management, in-transit visibility). Unfortunately, there are significant challenges to be overcome regarding data extraction and data interpretation in order for this opportunity to be realized. Data Extraction: One problem is the difficulty in easily and automatically extracting very specific data elements from Web sites for use by operational systems. New technologies, such as XML and Web Querying/Wrapping, offer possible solutions to this problem. Data Interpretation: Another serious problem is the existence of heterogeneous contexts, whereby each SOURCE of information and potential RECEIVER of that information may operate with a different context, leading to large-scale semantic heterogeneity. A context is the collection of implicit assumptions about the context definition (i.e., meaning) and context characteristics (i.e., quality) of the information. As a simple example, whereas most US universities grade on a 4.0 scale, MIT uses a 5.0 scale posing a problem if one is comparing student GPAs. Another typical example might be the extraction of price information from the Web: but is the price in Dollars or Yen (If dollars, is it US dollars or Hong Kong dollars), does it include taxes, does it include shipping, etc. and does that match the receivers assumptions? In this paper, examples of important context challenges will be presented and the critical role of metadata, in the form of context knowledge, will be discussed.

This keynote address to the IEEE Third Metadata conference highlights the need for strict semantic validation of Web documents, but does not explicitly address schemas. Instead it focusses on the problem of data extraction from heterogenous (legacy) sources and data interchange.

Researchers that authored this literature


Using XML Schemas to define B2B Datatypes



Link to this resource

This document illustrates how the types of datatypes typically used in business-to-business (B2B) messaging can be defined as part of an XML Schema.

Researchers that authored this literature

Teams which were involved in producing this literature

An Extensible Approach for Modeling Ontologies in RDF(S)



Link to this resource

RDF(S) constitutes a newly emerging standard for metadata that is about to turn the World Wide Web into a machine-understandable knowledge base. It is an XML application that allows for the denotation of facts and schemata in a web-compatible format, building on an elaborate object-model for describing concepts and relations. Thus, it might turn up as a natural choice for a widely-useable ontology description language. However, its lack of capabilities for describing the semantics of concepts and relations beyond those provided by inheritance mechanisms makes it a rather weak language for even the most austere knowledge-based system. This paper presents an approach for modeling ontologies in RDF(S) that also considers axioms as objects that are describable in RDF(S). Thus, we provide flexible, extensible, and adequate means for accessing and exchanging axioms in RDF(S). Our approach follows the spirit of the World Wide Web, as we do not assume a global axiom specification language that is too intractable for one purpose and too weak for the next, but rather a methodology that allows (communities of) users to specify what axioms are interesting in their domain.

This paper addresses the issues that other commentators have made about RDF and RDFS, i.e. that they have very limited expressivity as far as schematic constraints are concerned (see "A Comparison of Schemas for Video Metadata Representation" in this theme). It involves a the definition of a schema which controls the composition relationships between the vocabulary primitives. This paper presents the semantics of this schema in terms of a frame-logic, but the results are generalisable and provide a tractable way of dealing with complex schemata. Terminologically speaking it addresses ontologies (the authors are rooted in the Knowledge community) but their discussions are equally applicable to all schema description systems.

Standards discussed by this literature

Researchers that authored this literature Teams which were involved in producing this literature

Ontologies as Conceptual Models for XML Documents



Paper presented at 12th Knowledge Acquisition for Knowledge-Based Systems Workshop (KAW'99), Banff, Canada, October 1999.

Access to XML-based documents currently relies on query languages that are closely tied to the document structures, i.e. when looking for information one has to be aware of this structure and cannot easily specify the information needs conceptually. Our approach uses ontologies to access sets of distributed XML documents on a conceptual level. We integrate conceptual modeling, inheritance, and inference mechanisms on the one hand with the popularity, simplicity, and flexibility of XML on the other hand. We present an approach that defines the relationship between a given ontology and a document type definition (DTD) for classes of XML documents. Thus, we are able to supplement syntactical access to XML documents by conceptual, i.e. real semantic access.

This paper describes the use of ontologies to drive the production of DTDs to allow the XML document syntax to reflect an established complex knowledge structure, rather than trying to interpret a knowledge structure from an overly complex document structure. It does not address many issues of schema, e.g. representation of constraints, but it does address the issue of attaching semantics to a document type.

Teams which were involved in producing this literature


Is the Web just a large Database



Link to this resource

This report is a summary of the current overlapping hot topics from the 8th International Conference on the WWW and from the Extending Database Technology Summer School which are of current research interest.

This report focusses on the difference in the way that the WWW community and the database research community view metadata schemas.

Metadata - The Key to Content Management Services



Link to this resource

Large scale multimedia services have great need of content management systems. Metadata has a key role to play in allowing such systems to be built and automated. A management system composed of multiple distributed components is described. Each component implements a service such as content validation, transcoding, search or generation; and relies on metadata information to achieve success. Metadata specification, security, transport, and storage needs have been identified through the implementation of prototype components. Although the technology employed in each of the management components is often straightforward, the impact on organisational processes and interworking may be significant.

This paper describes work at BT Research Labs to use metadata for multimedia content validation, an application ripe for metadata schema. Their work does not apparently use schema at all.

XML Schema Technical Tutorial



Link to this resource

A technical conference tutorial given at Philadelphia in November 1999.

Standards discussed by this literature


A Comparison of Schemas for Video Metadata Representation



Link to this resource

In the past, a lot of effort has gone into generating descriptors and description schemes for video indexing but comparatively little research has been done on schemas capable of defining the structure, content and semantics of video documents and enabling validation and higher levels of automated content checking. This paper compares the capabilities of the RDF Schema, Extensible Markup Language (XML) Document Type Definitions (DTD's), Document Content Description (DCD) and Schema for Object-Oriented XML (SOX), for supporting and validating hierarchical video descriptions based on Dublin Core, MPEG-7 and a specific hierarchical structure. Finally this paper proposes a hybrid schema based on features from each of these schemas which will satisfy the MPEG-7 Description Definition Language (DDL) requirements.

This paper provides a good, practical comparison between a number of schema languages. Unfortunately it only mentions XML Schema in passing as it predates the work on this proposed standard.

Standards discussed by this literature


Beyond the SGML DTD



Link to this resource

This short article outlines the confusion caused by the dual role of SGML and XML DTDs: grammars for the purpose of lexical parsing and schemas for the purpose of semantic validation.

This is an email posting originally sent to the W3C XML Working Group and which was subsequently expanded to several XML Conference papers (XML 97 and XML Europe 98). An unofficial version of the latter is available at <a href="http://www.ecs.soton.ac.uk/~lac/XMLEurope98/DTDsToSchemas.html">http://www.ecs.soton.ac.uk/~lac/XMLEurope98/DTDsToSchemas.html</a>.

Literature Overview



<B>Background</B> <P> The necessity for some kind of semantic validation is given by an IEEE Metadata conference keynote paper <a href='./Managing_Standards_Compliance.html'><img src=@/text.gif border=0 align=center></a> and specifically the need for metadata schemas and repositories <a href='./Metadata_-_The_Key_to_Content_Management_Services.html'><img src=@/text.gif border=0 align=center></a>. <P> <B>Schema Architectures</B> <P> A family tree showing the relationships between the various standards for schemas <a href='./Family_Tree_of_Schema_Languages_for_Markup_Languages.html'><img src=@/text.gif border=0 align=center></a> and a comparison of some of the major schema standards in a practical application <a href='./A_Comparison_of_Schemas_for_Video_Metadata_Representation.html'><img src=@/text.gif border=0 align=center></a> reveal the strengths and weaknesses of various approaches. A critique <a href='./An_Extensible_Approach_for_Modeling_Ontologies_in_RDF(S).html'><img src=@/text.gif border=0 align=center></a> of one of the latest standards (RDF-Schema) with respect to the expression of schema constraints also shows a practical approach for overcoming its problems. Contrariwise, the complications of XML- Schema (see <a href='./Web-based_Information_Access.html'><img src=@/text.gif border=0 align=center></a> for a tutorial) have been recently addressed by a simple formalism <a href='./Web-based_Information_Access.html'><img src=@/text.gif border=0 align=center></a>, <a href='./Hedge_automata_-_a_formal_model_for_XML_schemata.html'><img src=@/text.gif border=0 align=center></a>. The dual purpose of a DTD in terms of lexical parsing and semantic validation is presented in <a href='./Beyond_the_SGML_DTD.html'><img src=@/text.gif border=0 align=center></a>, and a DTD-based mechanism for overcoming the complexities of the schema standards is presented in <a href='./Semantic_Metadata_for_the_Integration_of_Web-based_Data_for_Electronic_Commerce.html'><img src=@/text.gif border=0 align=center></a>. Finally the differing perspectives on schemas of the WWW and database communities is discussed in <a href='./Is_the_Web_just_a_large_Database.html'><img src=@/text.gif border=0 align=center></a>. <P> <B>Schema Development</B> <P> Schemas may be created from existing ontologies <a href='./New_Approaches_to_Cataloguing,_Querying_and_Browsing_Geospatial_Metadata.html'><img src=@/text.gif border=0 align=center></a> or by analysing collections of document instances <a href='./Extracting_Schema_from_Semistructured_Data.html'><img src=@/text.gif border=0 align=center></a>. The use of multiple schemas may call for an integration <a href='./Mostly_Metadata_-_A_Bit_Smarter_Technology.html'><img src=@/text.gif border=0 align=center></a> or mediation <a href='./The_Role_of_Document_Type_Definitions_in_Electronic_Data.html'><img src=@/text.gif border=0 align=center></a> effort. <P> <B>Specific Schemas</B> <P> Some particular schemas are covered for bibliographic description of digital library objects <a href='./A_Metadata_Architecture_to_Represent_Electronic_Documents_on_the_Web.html'><img src=@/text.gif border=0 align=center></a>, collections of interlinked Web pages <a href='./A_Schema-Based_Approach_to_Web_Engineering.html'><img src=@/text.gif border=0 align=center></a> and business to business communications <a href='./Toward_Unified_Metadata_for_the_Department_of_Defense_.html'><img src=@/text.gif border=0 align=center></a>. In the multimedia arena, schema techniques are described for content validation <a href='./Meta-Data_Jones_and_the_Tower_of_Babel_-_The_Challenge_of_Large-Scale_Semantic_Heterogeneity_.html'><img src=@/text.gif border=0 align=center></a> and managing constraints <a href='./XML-Constraints_with_Scheme.html'><img src=@/text.gif border=0 align=center></a>. <P> <B>Application and Practical Issues</B> <P> There are pitfalls in rigidly adhering to schema standards <a href='./How_to_Promote_Organic_Plurality_on_the_WWW.html'><img src=@/text.gif border=0 align=center></a>, and an approach where schema compliance is not enforced during all stages of a document's life-cycle may be useful <a href='./Literature_Overview.html'><img src=@/text.gif border=0 align=center></a>. Applications of XML schemas and repositories in E-commerce are described in <a href='./Toward_Unified_Metadata_for_the_Department_of_Defense_.html'><img src=@/text.gif border=0 align=center></a>, <a href='./Using_XML_Schemas_to_define_B2B_Datatypes.html'><img src=@/text.gif border=0 align=center></a>.


A Metadata Architecture to Represent Electronic Documents on the Web



Link to this resource

Metadata have long played an important role to database systems as catalogue data used to support database management systems operations. Only recently, have metadata been also employed to describe digital resources available across networks. This paper presents a formal structure for representing and describing electronic documents on the Web. It is based on a metadata conceptual model, and although the idea of creating a model for metadata management is not new, this model explores relationships which allow the association of information resources at different levels of granularity.

A proposal from the Digital Libraries arena for a bibliographic schema to describe document objects stored in a digital library.

Toward Unified Metadata for the Department of Defense



Link to this resource

Data sharing within and among large organizations is possible only if adequate metadata is captured. But administrative and technological boundaries have increased the cost and reduced the effectiveness of metadata exploitation. We examine practices in the Department of Defense (DOD) and in industry's Electronic Data Interchange (EDI) to explore pragmatic difficulties in three major areas: the collection of metadata; the use of intermediary transfer structures such as formatted messages and file exchange formats; and the adoption of standards such as IDEF1X-97. We realize that in large organizations, a complete metadata specification will need to evolve gradually. We are concerned here with initial steps. We therefore propose a simple framework, for both databases and transfer structures, which can accommodate varying degrees of metadata specification. We then propose some conceptually simple (but rarely practiced) techniques and policies to increase metadata reusability within this framework.

This older paper takes a DoD/ISO/databases perspective on the problem of mediation between metadata schemas.

DTD Transformation by Patterns and Contextual Conditions



Paper presented at SGML/XML'97 (12/8, 1997)

This presentation shows transformations of DTD's and XML documents. Each operator transforms not only XML documents, but also DTD's. The output instances are guaranteed to conform to the output DTD's. Furthermore, the output DTD's are minimally sufficient; in other words, they allow only those XML documents which may be generated. Operators are controlled by patterns and contextual conditions. Patterns are conditions on (possibly non-immediate) subordinate nodes, and contextual conditions are conditions on non-subordinate nodes (e.g., superior nodes, ancestor nodes, sibling nodes, and subordinates of sibling nodes). Such DTD transformation is highly important for at least reasons. First, it helps DTD evolution; by writing an update program with operators, we can update not only instances but also DTD's. Second, transformation among different DTD's become a lot easier; we can examine DTD's created by transformation operators. This work is based on the theory of tree automatons. A DTD is first translated to a tree automaton. This tree automaton is then repeatedly transformed. Finally, this tree automaton is translated back to a DTD. Patterns and contexual conditions of operators are also captured as tree automatons. In this presentation, we demonstrate an example of DTD transformation rather than providing theoretical details.

This paper is important because it addresses both hedge automata (a very recent entry into the XML Schema effort) and their application to managing the process of schema evolution.

Researchers that authored this literature


The Role of Architectural Forms in XML-EDI



Link to this resource

This paper suggests how some of the Architectural Form Definition Requirements expressed in Annex A-SGML Extended Facilities of ISO/IEC 10744-1997 could be used to simplify the creation and management of XML document type definitions (DTDs) that are designed to be used for business-to-business electronic data interchange (EDI).

This paper presents the SGML Architectural Form as an alternative mechanism for XML document semantics. The advantage which it offers it that AF's are a technique which can be immediately realised using current XML standards and tools. However, the disadvantage is that AF's provide only a labelling mechanism that can be used to map specific kinds of elements to specific processing semantics. True schemas declare the explicit data semantics and constraints, whereas AF's declare conformance to a semantics which is hidden in a processing module. However, it may well be (as the author states) that AF's provide a short-term solution for instance documents of well-understood schema.

Researchers that authored this literature

Teams which were involved in producing this literature

A Schema-Based Approach to Web Engineering



Link to this resource

Modern WWW applications are highly integrated, rapidly changing, distributed multi-user systems which must take into account system and enterprise critical non-functional requirements. We present a twofold approach to web engineering: the static part addresses all aspects that belong to the contents of pages, text and images, while the dynamic part deals with inter-action with databases and other application systems. We will focus on the static part, called document engineering. It involves the tasks of document design, authoring, and document production. Our approach to web engineering will suggest the reusable components content, structure, navigation, and layout. We will present SchemaText, an integrated software tool, that provides a schema-based approach to web engineering. SchemaText implements the basic ideas of our methodology.

From the hypertext community, this paper describes an attempt to provide a schema-based system that controls authoring-in-the-large, ie the construction of a Web site or Web application consisting of interlinked independent pages.

Software discussed by this literature


Mostly Metadata - A Bit Smarter Technology



Link to this resource

Metadata has been commonly referred to as data about data. This definition unfortunately misses the key point about metadata: structure. Metadata is more appropriately defined as structured data about data. This structure is the crucial element that gives metadata the edge over full-text indexing. The benefit is that the structure alludes to the semantics of the metadata. Metadata will become more prevalent in future information systems and will be relied on to provide more than just descriptive functionality. Interoperability will be the major issues facing metadata communities. These include the areas of registration of schemas, extensibility and inheritance, and internationalisation.

This paper provides useful background reading on the necessity for metadata schemas and the requirements for schema repositories.

Hedge automata - a formal model for XML schemata



Link to this resource

This note shows preliminaries of the hedge automaton theory. In the XML community, this theory has been recently recognized as a simple but powerful model for XML Schemata. The design of RELAX (REgular LAnguage for XML) is directly based on this theory.

Software discussed by this literature

Researchers that authored this literature

Managing Standards Compliance



IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 25, NO. 6, NOVEMBER/DECEMBER 1999

Software engineering standards determine practices that "compliant" software processes shall follow. Standards generally define practices in terms of constraints that must hold for documents. The document types identified by standards include typical development products, such as user requirements, and also process-oriented documents, such as progress reviews and management reports. The degree of standards compliance can be established by checking these documents against the constraints. It is neither practical nor desirable to enforce compliance at all points in the development process. Thus, compliance must be managed rather than imposed. We outline a model of standards and compliance and illustrate it with some examples. We give a brief account of the notations and method we have developed to support the use of the model and describe a support environment we have constructed. The principal contributions of our work are: the identification of the issue of standards compliance; the development of a model of standards and support for compliance management; the development of a formal model of product state with associated notation; a powerful policy scheme that triggers checks; a flexible and scalable compliance management view.

This paper describes a document schema based on a model for standards compliance. It describes the use of this schema (defined in UML) in an environment where satisfaction of the schema is not required at all stages of the document's evolution. The paper is not Web-oriented and does not describe an XML-based system.

Family Tree of Schema Languages for Markup Languages



Link to this resource

A diagram showing the historical relationships between the various schema languages developed in the last 15 years for controlling data in SGML and XML.

An indispensible guide to various data constraint mechanisms.Taking the form of a timeline plotted against underlying abstraction (inheritance, grammar and patterns) it is very useful for helping to tell DCD from XML-Data from XML Schema. The author is a member of the XML Schema Working Group.

Software discussed by this literature


Web-based Information Access



Link to this resource

The need of friendly environments for effective information access is further enforced by the growth of the global Internet, which is causing a dramatic change in both the kind of people who access the information and the types of information itself (ranging from unstructured multimedia data to traditional record-oriented data). To cope with these new demands, the interaction techniques traditionally offered to the users have to evolve and eventually integrate in a powerful interface to the global information infrastructure. The new interaction mechanisms must be especially friendly and easy-to-use, since, given the enormous quantity of information sources available on the Internet, most of the users remain "permanent novices" with respect to each one of the sources they have access to. This tutorial offers a survey of the main approaches adopted for letting the users effectively interact with the Web. Thus, it covers topics related with both extracting the information of interest spread over existing Web sites and building new, more usable, sites. Being mainly "user-centered", the tutorial will analyze proposals coming from different areas, namely DB, AI, and HCI, which share the final goal of making the Web a huge, easy-to-access, information repository.

This paper provides an excellent background on the ways in which the database, AI and HCI communities are attempting to make the Web into an integrated information repository. In particular it focuses on the DB emphasis on providing a post-hoc structural overlay to the Web and the AI emphasis on knowledge representation.

XML-Constraints with Scheme



Link to this resource

This paper (which appeared in XML Europe 99) describes a (non-standard) mechanism for adding constraints to DTDs for the purpose of multimedia authoring.


 
  Developed by the IAM Research Group, University Of Southampton, UK
and funded by the Defence Evaluation and Research Agency (DERA), UK