Minibases: Transitionary Use of Schema-validated XML Data Islands
by Glenn Slayden. please contact me for corrections or comments 2003-03-25
Q: How do I create, load, validate, and access XML data islands?
A: Although perhaps much-derided as a Microsoftism, XML data islands do provide a useful migratory step between
the legacy world of HTML and the brave new world of XML. My argument in favor of XML data islands is that, in the specific case that I
outline, the tight binding (co-location) of
data within its processing context is not a detrimental mingling of the two but rather a helpful encapsulation.
An XML data island is a fragment of well-formed (and perhaps valid) XML on an HTML page. It's very easy to create and access this data,
although I found it quite difficult to accomplish validation against an XML schema. This was probably due more to the fact that I was learning
about namespaces, schemas, and validation for the first time during this exercise, rather than any inherent difficulty with data islands. I
can't be sure how difficult it would have been if I had been a schema definition expert.
For the purpose of this blurb, I'll be using the so-called W3C XML Schema 1.1* (May 2001). Note that this is the most
modern schema definition mechanism, as opposed to the two other commonly used systems: DTD and a stillborn Microsoft device. There are numerous
articles on the web which compare these three systems, and there are also several other fringe systems, but without going into too much detail, suffice
it to say that the W3C XML Schema definition language is the most comprehensive and capable. An official overview of it is available in this
primer
In the example case, a website contains pages in which a subset of a large database is displayed, but individual items within the subset may
be used on the page many times. It may be possible to reduce the size of the page downloads through normalization—sending only a
single copy of the data items (which
I'll call the minibase), and then using a client-side script such as java to build the final page from the minibase. As a bonus,
processing cycles are also distributed away from the server to the clients. Rather than spending time formatting HTML, the server now
just analyzes the data dependencies for a particular page, removes duplicates from the list, and prepares the minibase, an XML data island
containing just the data that the client will need to build the page.
Typically, the client-side script would not be subject to change as the minibase chages, so isolating that code as a <script> element
with an external source would reduce download size even more.
At this point, XML mavens will point out that my javascript is doing the job of an XSLT transformation. Perhaps so, but as a transitionary
measure, simply moving javascript from the ASP page to the client-side as described here is a much easier task than a complete paradigm-shift.
In other words, complex server-side applications exist which are based on scads of procedural-programattic code that can be preserved while still
moving into an XML environment.
And now for the code. The HTML file which contains an XML data island and javascript to select and display one of the items based on its
attribute value:à¡èÒkhaao<span class=tt>F</span>333333à¡èÒxxxyxx<span class=tt>F</span>444444à¡èÒxxxyxx<span class=tt>F</span>1
tl.xsd: the XML Schema definition, must be located in the same directory as the HTML file.
I strongly suggest that you run your schema through a schema validator such as W3C's, and
fix all the problems before attempting to use MSXML/XMLDOM to try to validate XML against it.
Notice that the attribute 'teid' is defined as an unsigned integer, one of the W3C XML primitive datatypes. This means that in the
XPath statement of selectSingleNode, we don't have to put quotes around the number we're looking for. However, even though it's a
numeric value, it must have quotes in the XML data island, since XML requires all attribute values to be quoted.
We can be sure that validation against the schema is actually occurring by changing one of those numeric values for teid in the
data island to a non-numeric value, say by inserting an 'x' into the middle of the number. When you refresh the HTML page, you should
get an error that the validation failed because of a type problem.
One maddening aspect of developing this code was that validation against the schema appears to be finicky and fragile. If the XML processor
doesn't like the slightest thing about your namespaces and the "hook-up" between the data and the schema, it will not perform the
validation, and the parse error will report success. The only way I found to make sure that the validation is actually happening
is to "break" it, as described in the previous paragraph, and see if the error is reported.
*To my view, the word, "schema" is used inconsistently, even in W3C documents. According to the American Heritage dictionary, "schema" is
a diagrammatic representation; an outline or model—the structure of, or, if you will, a Platonic "form" for, information. To use the
word to represent systems or languages in which such forms are implemented is, to me, incorrect. As such, the proper title of the cited
W3C document should be, W3C XML Schema Definition Language 1.1, or W3C XML Schema Schema 1.1 ☺.