Mark Wilson I am the creator of TopXML. I am available for international and local (Australia) contracts. I am a Solution Architect/Business Analyst. I have worked in IT in several countries (NZ, Australia, South Africa, UK) building and training teams for government and very large non-governmental organizations. I am ex-Microsoft Consulting Services. I wrote the first book on Microsoft XML published in 2000 called XML Programming with VB and ASP. Most recently I have been building tools for the SEO industry. Ask me for a 37 point SEO health-checkup for your website.
First posted :
05/19/2000
Times viewed :
2660
Washington,
September 15-18, 1999 – London, November 21-24, 1999
Real World XML - Examples of XML at Work
Michael
Corning
Tried and True
My previous talk made the case for
developing an inventory of reusable user interface patterns as the first
obligation of the ASP/XML development community. From that inventory of parts
can come the remarkable array of useful XML applications we are all ready and
willing to develop.
This talk changes the focus a little bit
and covers another important attribute of any new technology: does it work? Is
it effective? Does it let us do something we couldn’t do before, or in some
other way do better than we could do before?
In the following examples I put my money
where my mouth is. I show you some tools I’ve developed at Microsoft that
really did qualify, if I do say so myself, as useful. Indeed, compelling
evidence for their utility is evinced by the adoption of these tools by others
than their author. I plan to offer exactly such evidence.
Following in the spirit of my first talk,
there are some reusable components in these tools as well. Principally the
csv2xml.asp application shows you how to take any given csv file and build an
XML DOM in memory from its contents. Further, this application can create two
such DOMs so that the contents of these two DOMs can be compared. As you will
see, this ability comes in very handy when troubleshooting one of the most
vexing problems we’ve all faced from time to time: why does only one of two
“identically” configured machine another fail?
The second XML-based application uses XML
to export virtual memory allocation data. Further, this data is rendered
graphically using an XSL stylesheet. I will compare the VM data presented in
both tabular format and graphically. I leave it to each of you to decide which
format you would prefer to use.
In all cases, these examples of XML
applications are not samples, they are hard working programs constantly used by
Microsoft Product Support Services. Who knows, perhaps your next call to PSS
will show you just how effective they are.
Oh, and if on your next support call
tracking down version-control information (especially for you Oracle users out
there), if the Support Professional doesn’t bring one or both of these tools
out of his tool kit, give him my email address and I’ll see that his SP knows
about the appropriate application.
Michael Corning
Michael Corning is a Memetic Engineer on the Application Server Team at
Microsoft. Currently, his job is to automate the App Server test process
using XML and ASP. Corning's fifteen minutes of fame came from being
fortunate enough to have coauthored (with Steve Elfanbaum and David Melnick)
the first book on ASP, Working with Active Server Pages (Que, 1997). Since
then Corning has written extensively on topics of interest to ASP and XML
developers. One European reviewer of his book aptly noted that "the
author could scarcely contain his enthusiasm." Corning brings that
patented passion to the podium before audiences the world over whenever he
gets a chance to speak about software that, he believes, will have a
measurable impact on the course of human development
An Inventory of Processes
My former colleague on the Critical Problem
Resolution Team at Microsoft, Greg Winston, produce a totally cool tool for
verifying symbolic information on any given machine. In the process, Greg’s
“checksym” program took an inventory of all the processes running in any given
(or all) process running on any given machine where checksym.exe was installed.
While Greg was at it, he captured version information for each of those dlls.
He had to dig a little deeper than usual, but he even managed to figure out
where Oracle stored their dll version info. The checksym.exe program then
persists this information to the file system in the common csv file format.
Users could then use a spreadsheet or database program to access, sort, and
filter this data. There were only two little problems:
1. your machine had to have one of those viewers installed and you had
to go through the steps to get the csv file into those viewers;
2. any analysis of the data was also manual.
Csv files are dead, static files. XML
trees, on the other hand, are alive and seething with possibilities. So I set
about to write an asp application that would input a pair of csv files and save
XML files. Along the way the asp app would analyze the data in the files
automatically divining the important information latent in them.
I should add here, that I had several
discussions with Greg imploring him to at least export checksym.exe output as
XML, but my suggestion had to wait until he had implemented a handful of
previously planned enhancements to his utility. The point here is that is was
able to use ASP to develop a post processor that gave the user
everything they needed to exploit the power of XML with checksym without having
to rely on checksym’s author to implement those services. And yes, I could have
implemented csv2xml in Visual Basic, but I stayed with ASP because
Ø
the asp application could stay on my
server and be used by all of Support Professionals in PSS, and
Ø
I wanted to be sure everyone had the
“viewer” software (an assumption I could not make about spreadsheets or
database programs). IE 5.0 is not only ubiquitous, its usually always running
on every Support Professional’s desktop.
There is one other story I can tell from
work. One day I was working a particularly nasty memory leak issue in Site
Server 3. Engineers all over the world were reproducing the memory…except me.
Whenever this happens the first thing I do is confirm that two machines have
the same software and versions. In the past, this assurance has been taken at
face value. I ask either the customer or a colleague what versions and service
packs they use and compare the answer to my own system. On one such occasion I
asked an engineer in North Carolina to confirm that he was running Site Server
3/Service Pack 2. He assured me he was, yet he still showed a leak. I asked him
to run TLIST on his IIS process…sure enough, he was still on SP1. it was that
moment that I resolved to build a tool that would take all the guess work and
uncertainty out of this key troubleshooting step. That tool is csv2xml.asp.
So, how does it work?
Processing CSV Data
There are two distinct processes going on
in this application. The first one manipulates the data, and the second one
displays it. This application uses a variation on the original XML spreadsheet
you saw me discuss in my first presentation, so I will not cover any of those
previous comments here. Instead, this section of the current presentation
discusses some of the finer points of processing CSV files and, more
importantly, I cover some uses of the XMLDOM unique to this application (in
particular, the notion of strongly typed data).
Creating Objects
First I use so-called “static” objects in
ASP to instantiate the components used by csv2xml.asp.
Since the tags are cited in a single .asp
page, each of these instances has only page scope. Generally speaking, by the
way, the <OBJECT> tag is the best way to create objects in ASP. And for those who
take me seriously, when you run back to work and start commenting out your var
x=new ActiveXObject(“myApp.itsClass”) replacing them
with <OBJECT> tags, remember to comment out the variable declarations, too;
otherwise the scripting engine will see only the declared variable and will not
instantiate the object on the first method call (and the first method call will
then raise a runtime error).
Two Code Paths
At any rate, once I have my components I
can start processing the raw text data exposed by checksym.exe. This data is processed one of two ways. If only one file is being
processed, an XML tree is created from the csv data and the XML is displayed by
the XML spreadsheet. If two files are coming into csv2xml.asp then each csv
file is converted into its own XML tree and the two trees are analyzed for
differences. These differences are displayed in IE 5 using the default
stylesheet as a collapsible outline.
I’ll now walk through the code path for
both kinds of processing.
Common Code
Both code paths have one function in
common, convertCSV(). The first thing this
function does is get the csv file data into an instance of the filesystem
component. Next the convertCSV()
function starts to build the xmldom. Here is the code fragment:
This code section checks to see if a single
file is being processed or two. If only a single file is converted then the XML
data will be displayed in the XML spreadsheet and the XML will need to be
processed with the XML spreadsheet’s xsl stylesheet using the strongly-typed
data provided by the schemaCheckSym2.xml file. Otherwise, a simple document
element, checksym, is all that’s required for comparing two XML trees.
Note the syntax for creating Processing
Instruction nodes. Though the PI node looks like an ordinary XML node (at least
on the inside once you get past the “?” characters), you don’t manipulate it
the same way. That is, those things that look like attributes aren’t attributes
in the sense of, say, the document element. The difference is that the values
acceptable to the PI are not free form as they are with the document element. A
stylesheet PI, for example, needs the name of the PI, the type of stylesheet
used, and optionally the location of the stylesheet. When this optional
location is null the PI will assume that the stylesheet (usually an inline
Cascading Stylesheet) is internal. The point is that you can’t use an “xmlns”
attribute, for example. In other words, since PI attributes aren’t negotiable,
there’s no need for a generic method like setAtttribute().
The next for() loop takes the information at the top of the csv file and converts
them to attributes on the document element. These data apply to the entire
checksym namespace so the document element is the logical place to store them.
Since I know the structure of the csv file and it never changes, I know I have
to skip three lines before I start processing the process’s data. Once I get
past all the generic data, I start doing the heavy lifting in this application.
Before I get into the details of the conversion process, I want to take a
second and discuss some of the limitations of this design, for both of the
following issues will probably occur to you, too, should you decide to create
your own csv converter.
Design Second Thoughts
The first design of this application took
the csv file at face value and processed every field and row of csv formatted
data. Only later did I realize that this wasn’t very efficient. That is,
csv2xml.asp is designed to convert dll inventory data for a single process, so
why am I processing every row of data that will not change? Upon closer
scrutiny I discovered a related problem. The companyName field has fewer than n values. For example, the IIS process is
dominated by Microsoft dlls, yet I process every row the same way. One
consequence of this inefficiency is that if I want to filter for a given
company’s dlls I have to use a stylesheet to filter and extract distinct values
from this companyName field. The alternative
is to use the hierarchical structure of XML files to model this parent/child
problem. You can see how I changed the processing to exploit this feature in
the csv2xml2.asp file.
Another weakness of the original design is
that it hardwires the field names into XML node names. This is understandable
since the assumption that the structure of checksym’s csv file is static. Of
course it took Greg Winston only a few versions beyond the one I used to create
csv2xml.asp before he had to add a
field. When he did that, csv2xml.asp
broke. I will probably get around to making csv2xml.asp more flexible, but for now, it’s “a feature.”
Parsing CSV Data
The basic process going on inside convertCSV() is that each line of text is read from the filesystem object and split into an array using the “,” as a delimiter. This
is the way text data comes to my application; it is not the way I’d prefer. The
first problem with simple comma-delimited text is that some values contain
commas. Greg enclosed such fields with quotes, and I wish he’d delimited
everything with quotes; quotes would have been easier to work with since all
fields would be processed the same and embedded commas would pass through into
the array. As it is, I need to know ahead of time which fields are dates, then
I need to concatenate two consecutive fields back together (dates are split in
two by the JScript split()
function) and pass them to a date converter.
function convertDate(strDate)
{
x=strDate.replace(/\"/gi,"");
var strTime=strDate.match(/\d{2}:\d{2}:\d{2}/);
var newDate=new Date(x);
var strMonth=newDate.getMonth()+1;
var strDate=newDate.getDate();
// pad date and month for ISO 8601 compliance
return newDate.getFullYear()+"-"+_
(strMonth<"10"?"0"+strMonth:strMonth)+"-"+_
(strDate<"10"?"0"+strDate:strDate)+_
(strTime==null?"":"T"+strTime);
}
The input parameter looks something like
"\"April 29 1999 16:09:50\"" so we need to clean this up.
First we strip off the quotes with the replace() function. Then we copy the
time part of the date to a separate variable that we will use at the end of the
convertDate() function. Next, we
make a new JScript date value. This new date has that odd JScript format, Thu
Apr 29 16:09:50 PDT 1999, so we start breaking it into parts. Finally, we return
an ISO 8601 format (which requires leading zeros) to the convertCSV(). The ISO format is used by the stylesheet’s formatDate() function.
Rendering Data as Tool Tips
There is one other little extra processing
I do on the csv data. To save screen space I store some of the data about any
given dll in the TITLE attribute of the module field. I do
this in the splitModulePath() function.
This function accepts a string such as,
"C:\WINNT\SYSTEM32\MSVCRT.DLL"
and finds the last “\” returning the path for the dll as the path
attribute of the module node. This new node is then appended to the process
node.
Compare XML Trees
The last thing convertCSV() does before it passes control to the compareDOMs() function is reference each document element in an array. This makes
a convenient mechanism for the compareDOMs()
function to switch back and forth between the two XML trees. Another technique
(and one I think now I wish I had used) would have been to copy each documentElement to a child of a containing xmldom instance’s documentElement. At any rate, I will close now with a discussion of the mechanism
used to compare the two trees.
The first thing compareDOMs() does is create a new xmldom instance loading an explicit XML
string,
compareDOMs()
then creates some memory variables to store the computer name of each machine
being analyzed. Here’s one such assignment where the value comes from selecting
the computer attribute of the document element.
var
pc0=xmldoms[0].selectSingleNode("/checkSym/@computer").nodeValue;
compareDOMs()
then goes into a loop for the number of dlls in the process (which for IIS is
over 150 iterations). Each iteration has a vector of numbers and text for each
field of the csv file. I access this vector through the childNodes() property of the gridX object. In particular I am interested in the
values of childNodes() 0, 3, 8, and 10; or
module, timeDateStamp, fileVersion, and fileDescription respectively.
One of the tricks of doing this XML
spreadsheet-like navigation is not to refer to individual values and their
siblings. The trick is to make the “row” the object of interest then index into
the row through the childNodes()
property. It's all very confusing - this mixing of relation and tree metaphors,
but you get used to it.
Once I have the values for each of these
variables in the subject grid object (that is, the grid-like csv data in one or
the other XML tree) I first look in the complementary object for the current
module. If I find the other grid has that module name, I compare the linker
dates of the dll from both grids. If the linker dates differ, I store the
module data from both grids in two child nodes off a newly created file node
off of my xmlCheckFiles tree.
If the comparison of module names comes up
null, then this means the module is missing from the corresponding XML tree. I
then pass this module info (including the identity of the computer missing the
file) to the appendMissingFile() function. It adds another node in the missingFiles node of xmlCheckFiles.
When I’m done processing the first XML
tree, I start on the second one, but this time I only have to check for missing
files. I find files missing on the first tree in precisely the same manner as I
did when I checked the second tree.
You’ll see what all this looks like if you
run csv2xml.htm and use the default
files for comparison. These are real examples of two “identically” configured
machines. So you can quickly see how useful this little utility is, and how
much time it can save you when you set out to prove (or confute) the claim that
two machines have the same configuration.
A Look Inside Virtual Memory
The second application I built while on the
CPR team was an extension of code adapted by my teammates from Jeff Richter’s
Advanced Windows Programming, Third Edition. In his book, Jeff has a program
that explores the way Windows manages memory. I’m tempted to detour just a bit
into that intriguing and important topic, but I will not. Instead, I will show
you how I used XML to provide a view on the data provided by Jeff’s programs
(my teammates didn’t change the view of the data, just a wee bit of the way the
code was implemented).
Getting Setup
I suppose it would be helpful here, just in
case you’re reading this at a moment when you can follow along with source
code, to tell you how to set up the necessary files for the VMMap utility. So, first thing, let’s get the weird stuff out of the way.
First unzip the VMMap zip .
Then copy all the files (including the single .dll file) into a folder on your
web site. This Web site needs to be configured to run .asp files. Another
important step as long as the Windows Explorer is pointing to this directory,
add Write permissions to the folder that contains the VMMap files you just copied. The .dll will be writing to the file system,
so if you overlook this step you will raise a runtime error. Finally, register
the dll. If you don’t already have a copy of the .reg file included in the VMMap.zip file, then double click the one I provide; as a result, you will
now have both context menu support for registering and unregistering dlls, you
will also be able to register a .dll by “executing” it (meaning either with the
enter key, or with a single or double click (depending on your desktop
settings)). To run the program, simply execute the WriteStats.asp file.
How it works
The WriteStats.asp file couldn’t be easier.
First we instantiate the VMMap server like so:
This version of the .dll writes immediately
to the file system using the text passed in to it. Actually, it munges the name
a bit (see the data island below), so if you want to make repeated and separate
calls for virtual memory allocations, modify the code to use different names.
We’re working a more sophisticated version that tracks memory allocations over
time, storing them in an xmldom inside the component.
Next, the .asp file opens up two xmldoms:
<xml
src="Test__VMStat.xml" id="xml"></xml>
<xml src="vmMap2.xsl"
id="xsl"></xml>
The first data island will point to the XML
file created by the component, and the second one points to the stylesheet
needed to convert the data to a picture.
The minimalist script merely does a transformNode() on the VM data after the window onload event has fired. It sends
the resulting text into the innerHTML property of the waiting DIV tag. I use
the innerHTML property because I’m exploiting some of the attributes of the
SPAN tag. So let’s take a look at the stylesheet, then. Here’s the statement:
I left a few extra comments about the
alternative usages in the source code, including a statement that Jonathan
Marsh (Microsoft XSL Program Manager) says the above statement shouldn’t work
<g/>.
The Stylesheet Details
The workhorse of this stylesheet…is a pony.
Like almost everything else about this amazingly simple application, the style
sheet has only one moving part (did I just mix another pair of metaphors?).
<xsl:template match="/">
<span id="container">
<xsl:for-each select="vmmap/region">
<span>
<xsl:attribute name="style">
<xsl:eval>colorMe(this)</xsl:eval>
</xsl:attribute>
<xsl:attribute
name="title">
Base Addr: 0x<xsl:value-of select="@baseAddr"/> _
<xsl:apply-templates select="@size"/> _
<xsl:value-of select="@type"/> Bytes _
<xsl:value-of select="@descrip"/>
</xsl:attribute>
<xsl:eval>memString(this)</xsl:eval>
</span>
</xsl:for-each>
</span>
</xsl:template>
First, since the template uses the match=”/” attribute/value, it operates off the document root of the XML data.
The document root is not an element, it’s all elements in the document
(including the document element and any XML processing instruction nodes). The
document element (sometimes called the root element – but don’t use this
terminology since it’s too easy to get confused) is the single XML node from which
all other XML nodes descend. NOTE: this stylesheet continues to work if you
remove the match=”/” expression and add a “/” before the “vmmap/region” string. This may account for why my stylesheet works even when I’m
passing in a document element of XML instead of a document root.
Next, the stylesheet outputs a containing
SPAN tag. I originally used a DIV tag and I
broke the stylesheet into separate lines for easier reading, but for the
longest time I puzzled over the reason I was getting extra spaces between lines
on my display. As soon as I switched to an inline tag like SPAN, the line gaps went away, but there were still spaces between the
different colored regions in the output. If you look in the source code you
will see what I did to remove those last little distractions: I put all the xsl
expressions in this template on a single line. Given the nuances of XSL and
whitespace, and XSL’s pickiness about spaces around <xsl:attribute> tags, it’s generally a good idea to write your <xsl:attribute> oriented stylesheets on one line of code. Oh, and if you remove the
containing tag (SPAN or DIV matters not), then you’ll see the line gaps return.
Right, then…The next thing this template
does is loop through all the vmmap/region nodes, and the template
creates a new SPAN tag for each region node it traverses. Next, the stylesheet
creates an attribute for the SPAN tag. Since XSL can’t do that explicitly, it
provides the function through one of its own tags, <xsl:attribute>. The name attribute of the attribute tag creates the attribute of
the output tag. This would be true of any tag that is output, including another
xsl tag (yes, you can use xsl to create xsl stylesheets). This is a good time
to mention that if you click either the title or the image you will see the
HTML that this stylesheet outputs.
If you look at the resulting output you
will see that the SPAN tag has a STYLE attribute that begins by setting the background-Color of the span. This
color is coded to the type of virtual memory being reported by the component
and transmitted to the client in the “type” attribute of the XML node for the
current region. Here’s the script that creates this color attribute’s value:
There a re a few comments I should make at
this point. First xsl can process script internally to the stylesheet. It can
also output script that will be executed by the client. These are two
fundamentally independent and not interdependent kinds of script. Second, you
can use VBScript or any scripting language, but use the LANGUAGE attribute on
the <xsl:eval> tag if you do, and
use the scripting language’s keyword to point to the current object if that
keyword is not “this” (VBScript uses “me”). Oh, and if you haven’t guessed, the
<xsl:eval> tag is Microsoft’s
extension to enable scripting for stylesheets.
Ok, next comes the TITLE attribute that carries the same textual data that Mr. Richter’s
original program displayed. Note that to create the string used by TITLE, the stylesheet directly interrogated the current XML node’s baseAddr, type, and descrip
attributes using the <xsl:value-of>
tag. But one Richter value needs a bit extra
processing, so we get to the size attribute on the XML node by calling <xsl:apply-templates> instead of the <xsl:value-of> tag used on the three other attributes just mentioned.
This <xsl:apply-templates> tag goes in search of a template that has a match attribute value
equal to the select attribute value used by <xsl:apply-templates>. It finds such a template when it sees this:
This is yet another example of having to be
very careful about whitespace. Again, in the source code you will find this
expression in on one line. If you ran the template with those two extra line
feeds, you’d get the line feeds in the TITLE value, and you’d get two line
feeds because you told the template to preserve the whitespace. If you remove
the xml:space="preserve" you will only get one line feed, but you’ll lose the space just
before the closing </xsl:eval>
tag. So put everything on one line inside this
template, keep the preserve space attribute, and leave that little space after
the formatted nodeValue and everything will look just the way in intended it.
Oh, and the formatNumber() is a very cool feature for us poor JScript programmers (go on, yuk
it up good VBScripters <g/>). There are other formatting methods for
dates and things. See the MSDN docs for details.
The last step in this main template does
the heavy lifting. I use <xsl:eval> again to call memString()
passing in the current node. The memString() method computes the relative size of the span tag by creating a
string where (with one exception) each character represents the 4k page NT uses
to allocate virtual memory. There’s a funny story about his method. I developed
it while accessing only the first few memory regions sent back by the
component. These early regions are relatively small, and everything went fine
as I tested the method with small test XML files. Once I thought my stylesheet
had passed its sanity tests I ran it against a full XML file returned by the
component.
Scroll down the output of the files you’ve
just installed until you see a green area with the text “EXTREMELY LARGE MEMORY BLOCK.” The first time I ran my prototype against real data IE went
catatonic. After a few seconds I started getting worried. I took a look at
Jeff’s sample output in his book and noticed that some of the regions he
reported were big. So I hauled out my trusty calculator and did the arithmetic:
220,200,960 divided by 4,096…oops, 53,760! That was going to be one big colored
section! So I went ahead and let IE crunch. Sure enough, twenty minutes later
it had looped through the code below over fifty thousand times. Anxious to see
what it looked like, I started to scroll down. Twenty minutes after the first
mouse click, and still not seeing the big region, I gave up and went back to
changing the program to the following:
function memString(node)
{
var size=node.getAttribute("size");
var memString="";
if (size>65536)
{
var
repeat=Math.floor(size/65536);
for (var i=1; i<=repeat; i++)
{
memString+=memPattern;
if (repeat>100 && i==50)
memString+="
***EXTREMELY LARGE MEMORY BLOCK***
";
The only thing left to mention about this
stylesheet is the <![CDATA[…]]> tag. It’s there to keep the xsl parser from poking around for text
to parse. When it sees this character data tag it says, “oh, only character
data…I want parsed character data so I’ll look for it elsewhere…” Always wrap
any script that’s inside a stylesheet with this tag so you don’t raise runtime
errors when the parser encounters a common character used in JScript programs,
“<” (used in the ubiquitous for() loop construct.
Summary
This session has been especially
pedestrian. But that’s the point: XML is a working man’s tool (and women use
it, too). Not only is the tree a relatively easy structure to navigate, the
same structure that is used to analyze data can be used to display it.
Along the way I showed you some practical
tools to both transform the ubiquitous csv file format into XML, and I showed
you a way to transform a data format that is hard for humans to judge quickly
into a form they can assess instantly. Nothing fancy, but I hope it makes up
for a lack of glamour with at least as much time saving value. And I suspect
each one of you is at least as busy as I am, and that time, not eye candy, has
a far greater value to you personally.