Importing XML Data to Word 2003 Using Visual Studio Tools for Office
Page OptionsIouri Simernitski September 2005 Applies to: Summary: Learn about importing XML data from a database into Word documents using Visual Studio Tools for Office, Version 2003. Learn to create a document programmatically that contains a button control allowing the user to access a database. (9 printed pages) Download OfficeVSTO_WordToImportXML.exe. ContentsIntroduction IntroductionMicrosoft Office Word 2003 opened new possibilities for the Word developer by introducing an XML version of its file format, known as WordprocessingML. You can now save a Word document as an XML or WordprocessingML file that describes the document, with elements representing items such as text, tables, lists, styles, and everything else necessary to describe a document fully. Because you are no longer limited to a binary file format, the possibilities of programmatically manipulating a Word document are dramatically increased. Also, in Word you can attach a custom schema to a document and use it to work with and manipulate custom XML data. This enables you to insert elements from a schema into a document to hold specific information. For example, if you attach a schema to a document and that schema describes data specific to books, you can add elements like <author>, <isbn>, <title>, and <copyrightdate> to the document to store information in XML files that adhere to the added schema. Your ability to import data from a database into one or many documents increases when you work programmatically. You can also parse a WordprocessingML file for data inside a schema's elements and retrieve the data for use elsewhere. This article demonstrates one possibility by showing how you can take information from a database and persist it inside a Word document in the form of a table. Options for Inserting XML into a Word DocumentThere are two approaches for inserting XML into a Word document:
Inserting XML Data Using an XSL TransformationOne of the best ways to transform XML into a WordprocessingML file is to use an XSL Transformation (XSLT) style sheet. XSLT is a language for transforming XML data into a predefined format, making it a natural fit for the job of converting XML into a Word file. However, there are some drawbacks to using this approach. One of them is that you must create fragments of WordprocessingML and piece them together into a resulting Word document. This is not trivial, nor always an easy task. To assist with this task, you can download the Office 2003 Tool: WordprocessingML Transform Inference Tool and use it to generate an XSLT from a Word document that contains XML markup conforming to a user-defined schema. You can use this generated XSLT to take XML files that conform to the same user-defined schema and transform them into Word documents with the same look and format. You must mark up the Word document on which you run the XSLT Inference Tool with the appropriate schema for the XML data it consumes. This can lead to limitations. For example, if one of the XML files you want to transform diverges from the user-defined schema even slightly, say by having an element that occurs more times than the schema expects, or that repeats in several places, then the transform with the tool-generated XSLT fails. If you use the XSLT option, you must ensure that all your input XML files are simple and consistently conform to your schema. Also note that XSLTs cannot aggregate data from several sources. For example, you may want to combine data from a Books.xml file with data from a Suppliers.xml file in the same document, but there is no way to do this through XSLT. Therefore, you might consider using an XSLT if you have only one XML source file. If you want to build a Word solution for aggregating data from different XML sources, using Visual Studio Tools for Office is the better option. Inserting XML Data Using Visual Studio Tools for OfficeInserting data using Visual Studio Tools for Office offers more flexibility than the XSLT option. It requires that you have a Word document or template to start with, but you have the freedom to use the Word and Microsoft Office object models to freely transform and insert XML data. One way to do so is to use an XML schema to mark up the places in the document where you want to insert data and then put data in the designated locations. If this sounds easy, stop for a moment to consider how your code determines which piece of data goes into which location. If the Word document shares the XML schema with the data, the task is relatively easy. For an example of such code, see Estimates Sample. Notice, however, that this is not always possible. If the Word document has a different schema from the schema of the data, you need to indicate which element from the data goes into which element of the document. One option is to insert the contents of the data element into the document element directly so that, for example, you retrieve the contents of element A from the data and insert it into element B in the Word document. However, a better option is to assign an XML attribute to every XML element in the Word document, indicating which piece of data goes with each element. The technology used to do this, XML Path Language (XPath), identifies the data in an XML file. Extracting XML Data Using XPathXPath is a powerful query language for XML data. If you are not familiar with XPath and you use XML for development, you are strongly encouraged to familiarize yourself with it. For more information about XPath, see XML Path Language (XPath). This article is limited to a small subset of the XPath language. Consider the following XML document. <a><b> This is the first b element </b> <b> This is the second b element <a> This is a under b <b> This is the third b element </b> </a> </b> </a> Here are some examples of XPath expressions based on this XML file. Table 1. Examples of Xpath expressions
Inserting Arbitrary XMLThis section describes how to insert XML that conforms to a specific schema. The specific schema allows elements to repeat and includes XML with elements that are specified by the schema to appear only once. To insert XML with single elements into Word
The approach just described for inserting arbitrary XML into a document works well for elements that always occur only once in a file. If you represent a list that varies in size according to input data, this approach does not work. A more complicated schema and function are required. When the function encounters a repeatable element in the Word document, it must repeat it as many times as necessary and then populate the resulting elements with corresponding XML data. Inserting XML with Repeating Elements into WordAlthough this solution focuses on inserting repeating elements, it also works for schemas with non-repeating elements. The solution includes a reusable schema for annotating documents (ListSchema.xsd), a code library (WordXmlInsert.dll) and a sample Word project using the library. The library includes a function that populates parts of a Word document with corresponding parts of an XML document, resizing any lists and tables as needed. As sample data, a partial XML representation of the Northwind database is provided. There is data about the Northwind customers, each of whom has zero or more orders, each of which contains one or more products. The project features a table and a paragraph, both linked to the same document source. A button in the Word document, when clicked, provides a user interface for choosing which customer data is displayed in the document. The XML schema, titled ListSchema.xsd, included with the sample files referred to in this article, describes a hierarchical table, so it is allowed to have sub-tables. At design time, you only have to design and mark up (with XML) the first row of each table. You must enclose the whole list in a <List> element. Inside the <List> element, additional rows are automatically added as needed. Remember that this <List> element must have an xpath attribute that corresponds to a node containing data for every list item that appears in the table. The <ListItem> elements in each row must also have an xpath attribute containing an XPath pointing to that item's data. Using Variable References in XPathThe value of the xpath attribute is allowed to contain variable references. In the Northwind example, the XML data contains order data for multiple customers. However, because the information is shown for one customer at a time, a variable is used in the XPaths to parameterize the query, such as How Resizing WorksTo populate a variable-sized table
Here is pseudocode that describes these three steps. InsertXML (XMLNodeForWordTable, XMLDocumentWithData, Level := 1)Begin RequiredRowCount := EvaluateXPath ( AttributeValue(XMLNodeForWordTable, "xpath")) CurrentLevelVariable := Concatenate("Row", Level) CurrentRowNum := 0 AddOrRemoveChildNodes(XMLNodeForAWordTable, RequiredRowCount) For Each CurrentRow in ChildNodes(XMLNodeForAWordTable) CurrentRowNum := CurrentRowNum + 1 For Each CurrentColumn in ChildNodes(CurrentRow) If CurrentColumn.ContainsChildNode("List") Then InsertXML(CurrentColumn. ChildNode("List"), XMLDocumentWithData, Level + 1) Else XPathToData := AttributeValue(CurrentColumn, "xpath") XPathToData := Substitute (CurrentLevelVariable, CurrentRowNum) CurrentColumn.SetTextContent(EvaluateXPath(XPathToData)) End If End For Each End For Each End The table reappears with the original formatting because, when the table disappears, its corresponding WordprocessingML is encoded and saved as the value of the savedContent attributes of the <List> element. Note that use of the plural of "attributes" is intentional because attribute length in Word is limited to 8000 characters. Therefore, attribute values are split into two 8000-character fragments that are each assigned attribute values of savedContent0, savedContent1, and so on The encoding pass is required, to make sure invalid characters (such as "<") are not assigned an attribute value, and to avoid white space normalization. As an example of white space normalization, the XML attributes name = "a b" and XML attributes name = "ab" are considered equivalent and would usually be normalized to XML attributes name = "a b". After new data is available, attribute values are concatenated and decoded (this is described later in this section). The resulting WordprocessingML is then inserted into the <List> element. We use a similar technique to display customized text when there is no data available. The attributes (contentWhenEmpty0, contentWhenEmpty1, and so on) of the <List> element are assigned the value of the WordprocessingML paragraph that is displayed when the table is empty. Note that before the value is assigned to the attribute, it is encoded and split, to get around the 8000-character limit discussed in the previous paragraph. Because WordprocessingML is verbose, we first compress and then encode it using Base64 encoding. Unfortunately, there is no compression class in Microsoft .NET Framework 1.1 (although you can look forward to one in the upcoming 2.0 version). So, we use Portable Network Graphics (PNG) bitmap encoding for compression because it is readily available in Microsoft .NET Framework and does not require any third-party library to be deployed to users' machines. A bitmap is created with bits from the string to compress. That bitmap is then persisted to a MemoryStream in PNG format, which uses lossless compression. Then, the content of the MemoryStream is encoded to a Base64 string. Implementation details are in the file named Compress.cs in the accompanying project. To run the test solution
ConclusionXML data is increasingly common and useful; however, few would consider writing or reading raw XML to perform everyday tasks. Using Visual Studio Tools for Office, one can use Word to present complex XML data in a well-formatted document. The downloadable solution provides a library that contains reusable code for presenting XML by writing just a few lines of code. The library also contains tools for creating and testing tables mapped to XML data. Additional ResourcesYou can find addition information about Word and Visual Studio Tools for Office at the following resources.
About the AuthorsIouri Simernitski is a software design engineer in the Microsoft Trinity group where he develops sample solutions using Visual Studio and Office. Mark Iverson is a programmer writer in the Microsoft Office User Assistance Group where he writes developer content for Word. |