BizTalk 2004 Flat File Schema Tutorial 2

by Tomas Restrepo

In this tutorial, I'll show you how to create a schema for a somewhat more complex Flat File involving both delimited and positional records, and how to use tags to distinguish between different possible records at one position in the file.

Consider this example file:


CITYLIST
CSEATTLE                       WA00198776
SWASHINGTON                    WA
SARIZONA                       AZ
CTUCSON                        AZ89112299
.
MILKPRICES
SEATTLE                         1000USD
TUCSON                         19200USD
      

What this has is information about two different topics: A list of cities, and milk prices, which form the two different parts of the file. Each one is formed by a set of records, but the records in each one of the parts is very different. Also, the first part (the list of cities) contains two different kinds of records (one for Cities, and one for States).

Let's see the definition of the file in more detail:

The City Records are defined like this:

The State records are defined like this:

The Milk Price records are defined like this:

As you can see, this file, although simple in appearance, has a few interesting challenges for parsing. First of all, it contains records of several kinds, divided in two different sections. Also, it has both a delimited structure (the sections are delimited among themselves by the '\r\n.\r\n' sequence) as well as positional records. It also has a few interesting things, like paddings on a few elements.

Let's start then to create a schema for it. Let's call this schema "MultiFile" ;) For this schema, we'll configure the following properties:

Property Value
TargetNamespace urn:schemas-winterdom-com:multifile
Case Uppercase

Our MultiFile schema will begin with a single root element, called MultiFile. Since this node will contain both of our sections in the file, we want it to be configured as a delimited record. We'll also want it to contain an <xs:all/> instead of an <xs:sequence/>, since we'd ideally like it to support the sections in any order. Here's how we'd configure it:

Property Value
Structure Delimited
Child Order Infix
Child Delimiter Type Hexadecimal
Child Delimiter 0x0D 0x0A 0x2E 0x0D 0x0A (\r\n.\r\n)

Inside this root node, we'll insert our section records. Let's start with the CityList record, which we'll configure like this:

Property Value
Structure Delimited
Tag Identifier CITYLIST
Group Order Type choice
Group Max Occurs unbounded
Group Min Occurs 0
Preserve Delimiter for Empty Data No
Child Order Infix
Child Delimiter Type Hexadecimal
Child Delimiter 0x0D 0x0A (\r\n)

This is where things start to get interesting. Notice how we use the Tag Identifier property to tell BizTalk that it can recognize that the record is of type CityList because it begins with the string CITYLIST (since we don't set the Tag Offset property, it is assumed to be 0).

Now we need to insert two child records, one for the City record, and one for the State record. For each one, we'll set up in our schema each field in the record as an attribute of the City Node. Each one should be configured like described below.

City
Property Value
Structure Positional
Tag Identifier C

Fields:
Name
   Justification           Left
   Pad Character           ' ' (space)
   Pad Character Type      Character
   Data Type               xs:string
   Positional Length       30
   Positional Offset       1
State
   Data Type               xs:string
   Positional Length       2
NumSewers
   Justification           Right
   Pad Character           0
   Pad Character Type      Character
   Data Type               xs:int
   Positional Length       8
      
State
Property Value
Structure Positional
Tag Identifier S

Fields:
Name
   Justification           Left
   Pad Character           ' ' (space)
   Pad Character Type      Character
   Data Type               xs:string
   Positional Length       30
   Positional Offset       1
Initials
   Data Type               xs:string
   Positional Length       2
      

Armed with this, we'd suppose we should now be able to parse this:


CITYLIST
CSEATTLE                      WA00198776
SWASHINGTON                   WA
SARIZONA                      AZ
CTUCSON                       AZ00002299
   

Right? Wrong! As you soon will have noticed, it fails. The reason, well, simple, actually. The Tag for the CityList element is defined as CITYLIST, not CITYLIST\r\n. So what BizTalk would parse right now would actually be this:


CITYLISTCSEATTLE                      WA00198776
SWASHINGTON                   WA
SARIZONA                      AZ
CTUCSON                       AZ00002299   
   

One would think that by setting the Child Delimiter Order property of the CityList node to "Prefix" would be enough, however, this is not true. The reason for this is that, since there are no actual fields in CityList (only child records), it simple won't recognize that delimiter there (nor insert it if you generate an instance from vs.net). If there actually was a small field just before the City and State records, it would be recognized as appropriate, though. This seems to me to be a small bug, though, since it is definitely not what one would expect.

So, what can we do about this?

In fact, afaik, there's no way to tell BizTalk that there's a CR/LF following the tag, at least directly from the UI. One way I found that works is to open up the XSD file directly in the text editor, find the tag_name attribute for CityList with the value CITYLIST&#x0D;&#x0A;. It's a fun trick in that we basically encode the CR/LF pair using character entities. I had a pleasant surprise to find it worked, but I definitely would like something better in the UI that allowed this. (Before anyone asks, yes, I did try putting that directly in the Tag Identifier property, but that won't work because BizTalk will encode the character entity into &amp;#0D; which doesn't work :().

Now, let's be honest... this is a bit of a hack, because we're basically telling BizTalk that the tag is actually "CITYLIST\r\n", which we know ain't the truth, but it works, and it's good enough for now.

Defining the rest of the document schema is very similar to what we've done so far. We need to add the MilkPrices node under the root, configured like this:

Property Value
Structure Delimited
Tag Identifier MILKPRICES&#x0D;&#x0A;
Group Order Type sequence
Preserve Delimiter for Empty Data No
Child Order Infix
Child Delimiter Type Hexadecimal
Child Delimiter 0x0D 0x0A (\r\n)

Under MilkPrices, we'll insert a positional record to contain each price, as well as the necessary fields, as defined below:

MilkPrice
Property Value
Structure Positional
Min Occurs 0
Max Occurs unbounded

CityName (Attribute Field)
   Justification           Left
   Pad Character           ' ' (space)
   Pad Character Type      Character
   Data Type               xs:string
   Positional Length       30

Price (Attribute Field)
   Justification           Right
   Pad Character           ' ' (space)
   Pad Character Type      Character
   Data Type               xs:decimal
   Positional Length       6
   
Currency (Attribute Field)
   Justification           Right
   Pad Character           ' ' (space)
   Pad Character Type      Character
   Data Type               xs:string
   Positional Length       3
      

Now, however, we run into a little snag. Depending on how our schema is defined, we can get into a case where either the schema fails to parse the sample file, or in which only a single record within the CityList section is recognized, while the rest is ignored.

The problem lies in the fact that, when you're defining the schema, you define, in essence, a state machine (might not be entirely obvious). In our case, the problem lies in the fact that the delimiters for both the MultiFile root node and the CityList node begin with \r\n. So, what happens is that the parser will process the first record inside CityList, and then find a \r\n. How will BizTalk know that this \r\n it found means the end of the records inside CityList, or the beginning of the delimiter for the MultiFile record?

Answer: it doesn't, so with our current configuration, the schema will fail to parse completely.

The simplest way to fix this that I found was to define the CityList delimiter to be postfix, and then change the delimiter in the MultiFile record to just be ".\r\n". This would eliminate the ambiguity, while allowing us to keep our file syntax.

You may have noticed that, so far, I've left one thing out: How to make sure that BizTalk Server recognizes the decimal numbers in the Price field of the MilkPrice record correctly. As it currently is, it will not parse the decimal digits correctly, and will just lead to a number that's a hundred times larger than it really is (there is no decimal separator in the file format).

Well, honestly? I haven't found a way to make this work from within the basic parser, there's just no property I can see that controls this. Two options come to mind:

Here's how our schema looks in the end:


<?xml version="1.0" encoding="utf-16"?>
<xs:schema xmlns="urn:schemas-winterdom-com:multifile" xmlns:b="http://schemas.microsoft.com/BizTalk/2003" elementFormDefault="qualified" targetNamespace="urn:schemas-winterdom-com:multifile" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:annotation>
    <xs:appinfo>
      <b:schemaInfo count_positions_by_byte="false" standard="Flat File" root_reference="MultiFile" case="upper" />
      <schemaEditorExtension:schemaInfo namespaceAlias="b" extensionClass="Microsoft.BizTalk.FlatFileExtension.FlatFileExtension" standardName="Flat File" xmlns:schemaEditorExtension="http://schemas.microsoft.com/BizTalk/2003/SchemaEditorExtensions" />
    </xs:appinfo>
  </xs:annotation>
  <xs:element name="MultiFile">
    <xs:annotation>
      <xs:appinfo>
        <b:recordInfo structure="delimited" preserve_delimiter_for_empty_data="true" suppress_trailing_delimiters="false" sequence_number="1" child_order="infix" child_delimiter_type="hex" child_delimiter="0x2E 0x0D 0x0A" />
      </xs:appinfo>
    </xs:annotation>
    <xs:complexType>
      <xs:sequence>
        <xs:annotation>
          <xs:appinfo>
            <b:groupInfo sequence_number="0" />
          </xs:appinfo>
        </xs:annotation>
        <xs:element name="CityList">
          <xs:annotation>
            <xs:appinfo>
              <b:recordInfo sequence_number="1" structure="delimited" preserve_delimiter_for_empty_data="false" suppress_trailing_delimiters="true" tag_name="CITYLIST&#xD;&#xA;" child_order="postfix" child_delimiter_type="hex" child_delimiter="0x0D 0x0A" />
            </xs:appinfo>
          </xs:annotation>
          <xs:complexType>
            <xs:choice minOccurs="0" maxOccurs="unbounded">
              <xs:annotation>
                <xs:appinfo>
                  <b:groupInfo sequence_number="0" />
                </xs:appinfo>
              </xs:annotation>
              <xs:element name="City">
                <xs:annotation>
                  <xs:appinfo>
                    <b:recordInfo sequence_number="1" structure="positional" preserve_delimiter_for_empty_data="true" suppress_trailing_delimiters="false" tag_name="C" />
                  </xs:appinfo>
                </xs:annotation>
                <xs:complexType>
                  <xs:attribute name="Name" type="xs:string">
                    <xs:annotation>
                      <xs:appinfo>
                        <b:fieldInfo sequence_number="1" justification="left" pad_char_type="char" pad_char=" " pos_length="30" pos_offset="1" />
                      </xs:appinfo>
                    </xs:annotation>
                  </xs:attribute>
                  <xs:attribute name="State" type="xs:string">
                    <xs:annotation>
                      <xs:appinfo>
                        <b:fieldInfo sequence_number="2" justification="left" pos_length="2" />
                      </xs:appinfo>
                    </xs:annotation>
                  </xs:attribute>
                  <xs:attribute name="NumSewers" type="xs:int">
                    <xs:annotation>
                      <xs:appinfo>
                        <b:fieldInfo sequence_number="3" justification="right" pos_length="8" pad_char_type="char" pad_char="0" />
                      </xs:appinfo>
                    </xs:annotation>
                  </xs:attribute>
                </xs:complexType>
              </xs:element>
              <xs:element name="State">
                <xs:annotation>
                  <xs:appinfo>
                    <b:recordInfo sequence_number="2" structure="positional" preserve_delimiter_for_empty_data="true" suppress_trailing_delimiters="false" tag_name="S" />
                  </xs:appinfo>
                </xs:annotation>
                <xs:complexType>
                  <xs:attribute name="Name" type="xs:string">
                    <xs:annotation>
                      <xs:appinfo>
                        <b:fieldInfo sequence_number="1" justification="left" pad_char_type="char" pad_char=" " pos_length="30" pos_offset="1" />
                      </xs:appinfo>
                    </xs:annotation>
                  </xs:attribute>
                  <xs:attribute name="Initials" type="xs:string">
                    <xs:annotation>
                      <xs:appinfo>
                        <b:fieldInfo sequence_number="2" justification="left" pos_length="2" />
                      </xs:appinfo>
                    </xs:annotation>
                  </xs:attribute>
                </xs:complexType>
              </xs:element>
            </xs:choice>
          </xs:complexType>
        </xs:element>
        <xs:element name="MilkPrices">
          <xs:annotation>
            <xs:appinfo>
              <b:recordInfo sequence_number="2" structure="delimited" preserve_delimiter_for_empty_data="false" suppress_trailing_delimiters="false" tag_name="MILKPRICES&#xD;&#xA;" child_delimiter_type="hex" child_delimiter="0x0D 0x0A" child_order="infix" />
            </xs:appinfo>
          </xs:annotation>
          <xs:complexType>
            <xs:sequence>
              <xs:annotation>
                <xs:appinfo>
                  <b:groupInfo sequence_number="0" />
                </xs:appinfo>
              </xs:annotation>
              <xs:element minOccurs="0" maxOccurs="unbounded" name="MilkPrice">
                <xs:annotation>
                  <xs:appinfo>
                    <b:recordInfo sequence_number="1" structure="positional" preserve_delimiter_for_empty_data="true" suppress_trailing_delimiters="false" />
                  </xs:appinfo>
                </xs:annotation>
                <xs:complexType>
                  <xs:attribute name="CityName" type="xs:string">
                    <xs:annotation>
                      <xs:appinfo>
                        <b:fieldInfo sequence_number="1" justification="left" pad_char_type="char" pad_char=" " pos_length="30" />
                      </xs:appinfo>
                    </xs:annotation>
                  </xs:attribute>
                  <xs:attribute name="Price" type="xs:decimal">
                    <xs:annotation>
                      <xs:appinfo>
                        <b:fieldInfo sequence_number="2" justification="right" pos_length="6" pad_char_type="char" pad_char="0" />
                      </xs:appinfo>
                    </xs:annotation>
                  </xs:attribute>
                  <xs:attribute name="Currency" type="xs:string">
                    <xs:annotation>
                      <xs:appinfo>
                        <b:fieldInfo sequence_number="3" justification="left" pad_char_type="char" pad_char=" " pos_length="3" />
                      </xs:appinfo>
                    </xs:annotation>
                  </xs:attribute>
                </xs:complexType>
              </xs:element>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>