Tip

Importing XML Data to Word 2003 Using Visual Studio Tools for Office

퓨전마법사 2005. 11. 21. 09:42
See this in the MSDN LibrarySee This in the MSDN Library

Page Options

Importing XML Data to Word 2003 Using Visual Studio Tools for Office

Iouri Simernitski
Mark Iverson
Microsoft Corporation

September 2005

Applies to:
Microsoft Office Professional Edition 2003
Microsoft Office Word 2003
Microsoft Visual Studio .NET 2003
Microsoft Visual Studio Tools for the Microsoft Office System, Version 2003

Summary: Learn about importing XML data from a database into Word documents using Visual Studio Tools for Office, Version 2003. Learn to create a document programmatically that contains a button control allowing the user to access a database. (9 printed pages)

Download OfficeVSTO_WordToImportXML.exe.

Contents

Introduction
Options for Inserting XML in Word
Extracting XML Data Using XPath
Inserting Arbitrary XML
Conclusion
Additional Resources
About the Authors

Introduction

Microsoft Office Word 2003 opened new possibilities for the Word developer by introducing an XML version of its file format, known as WordprocessingML. You can now save a Word document as an XML or WordprocessingML file that describes the document, with elements representing items such as text, tables, lists, styles, and everything else necessary to describe a document fully. Because you are no longer limited to a binary file format, the possibilities of programmatically manipulating a Word document are dramatically increased. Also, in Word you can attach a custom schema to a document and use it to work with and manipulate custom XML data. This enables you to insert elements from a schema into a document to hold specific information. For example, if you attach a schema to a document and that schema describes data specific to books, you can add elements like <author>, <isbn>, <title>, and <copyrightdate> to the document to store information in XML files that adhere to the added schema. Your ability to import data from a database into one or many documents increases when you work programmatically. You can also parse a WordprocessingML file for data inside a schema's elements and retrieve the data for use elsewhere. This article demonstrates one possibility by showing how you can take information from a database and persist it inside a Word document in the form of a table.

Options for Inserting XML into a Word Document

There are two approaches for inserting XML into a Word document:

  • Take XML data and transform it into a Word document in XML format.
  • Run code using Microsoft Visual Basic for Applications (VBA) or Microsoft Visual Studio Tools for the Microsoft Office System, Version 2003 (Visual Studio Tools for Office) that consumes XML data (or data in any format) and inserts the data into the correct place in a target Word document.

Inserting XML Data Using an XSL Transformation

One of the best ways to transform XML into a WordprocessingML file is to use an XSL Transformation (XSLT) style sheet. XSLT is a language for transforming XML data into a predefined format, making it a natural fit for the job of converting XML into a Word file. However, there are some drawbacks to using this approach. One of them is that you must create fragments of WordprocessingML and piece them together into a resulting Word document. This is not trivial, nor always an easy task. To assist with this task, you can download the Office 2003 Tool: WordprocessingML Transform Inference Tool and use it to generate an XSLT from a Word document that contains XML markup conforming to a user-defined schema. You can use this generated XSLT to take XML files that conform to the same user-defined schema and transform them into Word documents with the same look and format. You must mark up the Word document on which you run the XSLT Inference Tool with the appropriate schema for the XML data it consumes. This can lead to limitations. For example, if one of the XML files you want to transform diverges from the user-defined schema even slightly, say by having an element that occurs more times than the schema expects, or that repeats in several places, then the transform with the tool-generated XSLT fails. If you use the XSLT option, you must ensure that all your input XML files are simple and consistently conform to your schema. Also note that XSLTs cannot aggregate data from several sources. For example, you may want to combine data from a Books.xml file with data from a Suppliers.xml file in the same document, but there is no way to do this through XSLT. Therefore, you might consider using an XSLT if you have only one XML source file. If you want to build a Word solution for aggregating data from different XML sources, using Visual Studio Tools for Office is the better option.

Inserting XML Data Using Visual Studio Tools for Office

Inserting data using Visual Studio Tools for Office offers more flexibility than the XSLT option. It requires that you have a Word document or template to start with, but you have the freedom to use the Word and Microsoft Office object models to freely transform and insert XML data. One way to do so is to use an XML schema to mark up the places in the document where you want to insert data and then put data in the designated locations. If this sounds easy, stop for a moment to consider how your code determines which piece of data goes into which location. If the Word document shares the XML schema with the data, the task is relatively easy. For an example of such code, see Estimates Sample. Notice, however, that this is not always possible. If the Word document has a different schema from the schema of the data, you need to indicate which element from the data goes into which element of the document. One option is to insert the contents of the data element into the document element directly so that, for example, you retrieve the contents of element A from the data and insert it into element B in the Word document. However, a better option is to assign an XML attribute to every XML element in the Word document, indicating which piece of data goes with each element. The technology used to do this, XML Path Language (XPath), identifies the data in an XML file.

Extracting XML Data Using XPath

XPath is a powerful query language for XML data. If you are not familiar with XPath and you use XML for development, you are strongly encouraged to familiarize yourself with it. For more information about XPath, see XML Path Language (XPath). This article is limited to a small subset of the XPath language. Consider the following XML document.

<a>
<b>
This is the first b element
</b>
<b>
This is the second b element
<a>
This is a under b
<b>
This is the third b element
</b>
</a>
</b>
</a>

Here are some examples of XPath expressions based on this XML file.

Table 1. Examples of Xpath expressions

ExpressionMeaningResult
/aAll top-level <a> elementsOne element: the first <a> element
/a/bAll elements <b> children of element <a>; element <a> should be the top-level element of the documentTwo elements: the first and second <b> elements
//aAll <a> elements anywhere in the documentThe two <a> elements
/a/b[1]/text()Text inside the first subelement <b> of <a>; element <a> should be the top-level element of the documentThe text: "This is the first b element"

Inserting Arbitrary XML

This section describes how to insert XML that conforms to a specific schema. The specific schema allows elements to repeat and includes XML with elements that are specified by the schema to appear only once.

To insert XML with single elements into Word

  1. Create a reusable schema for data insertion.

    The schema has a top-level <document> element containing any number of <item> subelements. Every <item> element has an xpath attribute that contains the XPath to the corresponding node in source XML. See Figure 1.

    Word document annotated with a simple schema for XML mapping (click to see larger image)

    Figure 1. Word document annotated with a simple schema for XML mapping (click picture to see larger image)

  2. Create a reusable function that replaces the content of any <item> element in a Word document with the data referred to by the XPath contained in that element's xpath attribute. This function likely uses the Word document object and the XML file as its parameters.
  3. When you build a solution that inserts XML data into a Word document, you mark up regions of the document with the schema you created in Step 1, then set the appropriate xpath attribute on each of the <item> elements and call the function that you created in Step2.

The approach just described for inserting arbitrary XML into a document works well for elements that always occur only once in a file. If you represent a list that varies in size according to input data, this approach does not work. A more complicated schema and function are required. When the function encounters a repeatable element in the Word document, it must repeat it as many times as necessary and then populate the resulting elements with corresponding XML data.

Inserting XML with Repeating Elements into Word

Although this solution focuses on inserting repeating elements, it also works for schemas with non-repeating elements. The solution includes a reusable schema for annotating documents (ListSchema.xsd), a code library (WordXmlInsert.dll) and a sample Word project using the library.

The library includes a function that populates parts of a Word document with corresponding parts of an XML document, resizing any lists and tables as needed. As sample data, a partial XML representation of the Northwind database is provided. There is data about the Northwind customers, each of whom has zero or more orders, each of which contains one or more products.

The project features a table and a paragraph, both linked to the same document source. A button in the Word document, when clicked, provides a user interface for choosing which customer data is displayed in the document.

The XML schema, titled ListSchema.xsd, included with the sample files referred to in this article, describes a hierarchical table, so it is allowed to have sub-tables. At design time, you only have to design and mark up (with XML) the first row of each table. You must enclose the whole list in a <List> element. Inside the <List> element, additional rows are automatically added as needed. Remember that this <List> element must have an xpath attribute that corresponds to a node containing data for every list item that appears in the table. The <ListItem> elements in each row must also have an xpath attribute containing an XPath pointing to that item's data.

Using Variable References in XPath

The value of the xpath attribute is allowed to contain variable references. In the Northwind example, the XML data contains order data for multiple customers. However, because the information is shown for one customer at a time, a variable is used in the XPaths to parameterize the query, such as "/NorthwindDataSet/Customers[$Customer]/Orders[1]/Order Details[1]/UnitPrice." In addition to any number of user-defined variables, there are special variables, such as row1, row2, and so on, that are replaced with the row number when the table is filled.

How Resizing Works

To populate a variable-sized table

  1. Evaluate the XPath for each <List> (that is, the value of the xpath attribute of the <List> element) against the root element of the XML data. This provides the node set with which to fill the table.
  2. Resize the table so that it has as many rows as there are items in the node set. This is done by adding or deleting <ListRow> elements.
  3. For each <ListRow> element, increment the variable of the current row (for example row1, row2, up to rowN) and then evaluate the XPath for each <ListItem>. If the <ListItem> contains a <List> subelement, then there is a sub-table. When there is a sub-table, the code recursively returns to Step 1.

Here is pseudocode that describes these three steps.

InsertXML (XMLNodeForWordTable, XMLDocumentWithData, Level := 1)
Begin
RequiredRowCount := EvaluateXPath (
AttributeValue(XMLNodeForWordTable, "xpath"))
CurrentLevelVariable := Concatenate("Row", Level)
CurrentRowNum := 0
AddOrRemoveChildNodes(XMLNodeForAWordTable, RequiredRowCount)
For Each CurrentRow in ChildNodes(XMLNodeForAWordTable)
CurrentRowNum := CurrentRowNum + 1
For Each CurrentColumn in ChildNodes(CurrentRow)
If CurrentColumn.ContainsChildNode("List") Then
InsertXML(CurrentColumn. ChildNode("List"),
XMLDocumentWithData, Level + 1)
Else
XPathToData := AttributeValue(CurrentColumn, "xpath")
XPathToData := Substitute (CurrentLevelVariable,
CurrentRowNum)
CurrentColumn.SetTextContent(EvaluateXPath(XPathToData))
End If
End For Each
End For Each
End

The table reappears with the original formatting because, when the table disappears, its corresponding WordprocessingML is encoded and saved as the value of the savedContent attributes of the <List> element. Note that use of the plural of "attributes" is intentional because attribute length in Word is limited to 8000 characters. Therefore, attribute values are split into two 8000-character fragments that are each assigned attribute values of savedContent0, savedContent1, and so on The encoding pass is required, to make sure invalid characters (such as "<") are not assigned an attribute value, and to avoid white space normalization. As an example of white space normalization, the XML attributes name = "a b" and XML attributes name = "ab" are considered equivalent and would usually be normalized to XML attributes name = "a b". After new data is available, attribute values are concatenated and decoded (this is described later in this section). The resulting WordprocessingML is then inserted into the <List> element.

We use a similar technique to display customized text when there is no data available. The attributes (contentWhenEmpty0, contentWhenEmpty1, and so on) of the <List> element are assigned the value of the WordprocessingML paragraph that is displayed when the table is empty. Note that before the value is assigned to the attribute, it is encoded and split, to get around the 8000-character limit discussed in the previous paragraph.

Because WordprocessingML is verbose, we first compress and then encode it using Base64 encoding. Unfortunately, there is no compression class in Microsoft .NET Framework 1.1 (although you can look forward to one in the upcoming 2.0 version). So, we use Portable Network Graphics (PNG) bitmap encoding for compression because it is readily available in Microsoft .NET Framework and does not require any third-party library to be deployed to users' machines. A bitmap is created with bits from the string to compress. That bitmap is then persisted to a MemoryStream in PNG format, which uses lossless compression. Then, the content of the MemoryStream is encoded to a Base64 string. Implementation details are in the file named Compress.cs in the accompanying project.

To run the test solution

  1. Open the attached solution in Microsoft Visual C# .NET.
  2. On the Debug menu, click Start to compile and run. The custom build steps in the solution ensure that the built assemblies are fully trusted under Microsoft .NET Framework version 1.1.
  3. A Word document titled "NorthwindSample.doc" appears. Inside the document, click "Click to Select a Customer".
    NoteYou must be out of Design mode to click this button. You can toggle in and out of Design mode from the Toolbox.
  4. In the CustomerForm dialog box that displays, select a customer from the combo box. Click OK. The dialog box disappears and the information for the selected customer appears in a Word table.
  5. To display formatted placeholder text that appears when a table is empty, repeat Steps 3 and 4, selecting the customer "FISSA Fabrica Inter. Salchichas S.A. " This option contains no data and therefore displays the placeholder text instead of a table. Selecting a different customer causes the table to reappear.

Conclusion

XML data is increasingly common and useful; however, few would consider writing or reading raw XML to perform everyday tasks. Using Visual Studio Tools for Office, one can use Word to present complex XML data in a well-formatted document. The downloadable solution provides a library that contains reusable code for presenting XML by writing just a few lines of code. The library also contains tools for creating and testing tables mapped to XML data.

Additional Resources

You can find addition information about Word and Visual Studio Tools for Office at the following resources.

About the Authors

Iouri Simernitski is a software design engineer in the Microsoft Trinity group where he develops sample solutions using Visual Studio and Office.

Mark Iverson is a programmer writer in the Microsoft Office User Assistance Group where he writes developer content for Word.