Don't forget the BOM!

Link. May 9, 2008. Comments [2]. Posted in: .NET | BizTalk

If you're writing text files in an encoding that supports a Byte-Order Mark, you should always try to make sure that you include it, unless you have a protocol in place that precludes you from doing so (such as a legacy application that doesn't know how to deal with them).

One of the reasons you should always remember the BOM is that many applications can use it to try to guess what encoding they should use when trying to read the text you're feeding them.

Encoding detection based on the BOM is not foolproof, but it's better than having nothing at all, particularly in cases cases where simply assuming blindly a predefined encoding such as UTF-8 or UTF-16 might not be an option at all.

One particular case where remembering the BOM is very important is with UTF-8. Let me tell you a story to illustrate why:

We've been working the past few weeks on testing and improving a BizTalk-based solution for a client that some other consulting company had created. One particular piece had been working fine until, suddenly, we started getting an error when BizTalk tried to parse an incoming message with an error stating that "The character is not valid on the specified encoding".

Looking at the message, it was supposed to be UTF-8 encoded, and for the most part looked OK. The character causing trouble was, in fact, a 0xA0 char (non-breaking space) inside an element value. While this was not good, it wasn't clear why it was causing trouble.

Since it was an XML message, we opened it up in Internet Explorer: Yep, that too parsed it incorrectly and got stuck when it reached the problematic character.

Looking a bit further , we found that in this particular case, the original developer had written a piece of code that created a Stream object with the message contents and then fed that to BizTalk. The code looked a bit like this:

public static Stream CreateStream(String msg) {
   MemoryStream stream = new MemoryStream();
   byte[] bytes = Encoding.UTF8.GetBytes(msg);
   stream.Write(bytes, 0, bytes.Length);
   stream.Position = 0;
   return stream;
}

The message text itself was a piece of XML that included an <?xml?> declaration with the encoding attribute specifying UTF-8. This seemed OK, even if the code above just seemed like a pretty uncomfortable way of creating the stream.

However, this gave us a clue: UTF8Encoding.GetBytes() won't give you a BOM. Looking at the message in a binary editor, we validated indeed the message did not have a BOM at all. So we tried replacing the code above to one that simply used a StreamReader object (which uses UTF-8 with a BOM by default), and that fixed the issue right away!

This highlights why the BOM is so important for UTF-8: The basic characters in the set share the same values as the ASCII code. This is generally an advantage, but it can also mean that stuff that's incorrectly coded (such as our example above) might seem to work fine for a while until an unexpected character comes along and everything crumbles down. This is unlike other encoding, such as UTF-16, where things usually blow up right away.

In this particular case, the culprit was really a combination of factors: The lack of a BOM together with the presence of the encoding specification in the XML declaration [1]. I'm not sure why the XML stacks get stuck on a BOM-less UTF-8 file with an encoding declaration, but there you have it. So don't forget the BOM!

[1] I personally thing the encoding specification in the XML declaration is probably the single most stupid idea included in the XML spec. It's just downright evil.

Bitten by IEnumerable<T>

Link. May 7, 2008. Comments [2]. Posted in: .NET

Yesterday I was working on some fairly complex code involving some LINQ to SQL, lots of generics (with some reflection generously sprinkled on top) and some extension methods to complete the cake.

Most was working as expected, until I ran into a little snag. I had one line of code looking a bit like this:

DoSomething<MyType>(someCollection);

When I ran my NUnit tests, it appeared as if DoSomething() had never been executed. I knew that someCollection had some items in it, so I was pretty surprised. I fired up a debugger, put a breakpoint in the first line in of DoSomething and execute the code. No dice, the breakpoint wasn't even hit.

So I went one step further and put a breakpoint just at the call to DoSomething. The breakpoint got hit, pressed F11 (step into) and.... it went right to the next line without stepping into the method. What was up with that?

After scratching my head a lot, I realized it had been me being clever that caused the problem. See, DoSomething was basically a map of sorts, defined somewhat like this:

IEnumerable<T> DoSomething<T>(IEnumerable list) {
   foreach ( var obj in list )
      yield return obj.Map<T>();
}

Can you spot the problem?

I had created the method so that I could reuse the transformed object stream after the mapping, just in case I needed it. However, on this particular use I was making of the method I was disregarding the return and was simply interested in getting the internal function called for each item in the collection.

This turned out to be a really bad idea, because the way the C# compiler generates the code for this is pretty smart: It's completely pipelined/streamed, such that you need to consume the returned IEnumerable<T> object. If you don't, then the collection provider never gets iterated on, and thus, nothing happened.

In hindsight, this should've been pretty obvious, and it's a very natural behavior, if you think about it. I should've been able to pick it up sooner, but I admit the debugger behavior surprised me a bit.

Async Pipelines and PipelineReader<T> Issues

Link. April 7, 2008. Comments [0]. Posted in: .NET | PowerShell

I've been spending some time this week coding some changes to a custom PowerShell PSHost for an application. One of the changes I wanted to experiment with was changing the code that executed commands so that it used Pipeline.InvokeAsync() instead of Pipeline.Invoke().

There are a couple of things that need to be handled different in this case: How you process the results from the pipeline and how you handle errors. I'll concentrate on the first one, as it is the one that caused me a bit of trouble to get right.

To process the results from an asynchronous pipeline invocation, you need to use a PipelineReader<PSObject> object, which is what the Pipeline.Output property returns. This allows you to read objects generated by the pipeline execution as they are coming out (i.e. as soon as they are available) instead of waiting until the entire pipeline has executed to grab the results, so the idea is pretty nifty.

Unfortunately, the documentation on how to use this object correctly isn't very good. For example, you can't rely on the Count or IsOpen properties as boundary checks to detect how many items to attempt to read. In particular, the Count property isn't reliable if you're using Pipeline.InvokeAsync() because it only represents how many objects are currently available in the reader, not the total count of objects returned by the pipeline (this is natural once you realize it, but still).

Instead, you should really rely on the EndOfPipeline property of PipelineReader<T> to detect when you've reached the end of the object stream generated by the pipeline execution.

The second issue that's not very obvious is that when you use Pipeline.Invoke(), but you don't need to feed inputs to the command, then the pipeline won't really start executing until you close the PipelineWriter object returned by Pipeline.Input. If you don't do this, then PipelineReader.Read() will simply block forever.

Phantom Objects

The one nasty issue I did run is what appears to be a synchronization issue inside PipelineReader<T> itself. In my original attempt to use Pipeline.InvokeAsync(), I started getting some weird results: Ghost objects were coming out of the reader.

Ghost objects?

Pretty much, yes. Let's say I executed an "ls" command on my pipeline that should return 8 items. Sometimes, I'd indeed get the expected 8 items out of the pipeline before EndOfPipeline changed to true. Other times, however, I'd see 9 items come out of it.

The last item was a "ghost" object that was empty: a PSCustomObject with no properties at all. Where was it coming from?

The only good thing about this was that if it appeared at all, it always did it as the last place in the pipeline. This gave me a clue: Could this be a marker object inserted internally by PowerShell into the object stream to mark the end of the pipeline? It sure looked like some kind of null value.

. It is, in fact, AutomationNull.Value, which, although defined in System.Management.Automation.Internal, is a public type/property.

The reason I say this problem is a synchronization issue is that, for the user of PipelineReader<T>, the use of this marker object should've been transparent. Instead, it is a leaking abstraction that sometimes (and just sometimes!) gets exposed and returned from PipelineReader.Read() when it should never happen!

In the end, I ended up rewriting my code like this to work around this problem:

PipelineReader<PSObject> results = pipeline.Output;
while ( !results.EndOfPipeline ) {
   PSObject obj = results.Read();
   // check that the object returned isn't
   // $null, signaling the end of the pipeline
   if ( obj != AutomationNull.Value )
      // do something with the object
}

Dev Environment for PowerShell

Link. March 13, 2008. Comments [0]. Posted in: .NET | PowerShell

A few people have asked already about my PowerShell script for configuring a development environment for .NET / Visual Studio / SDK work, so I thought I might as well break it into it's own script.

Here it is:

###############################################################################
# Configures the .NET / Visual Studio / Windows SDK
# Build environment. Loosely based on the SDK batch files.
#
# First it will try to set up the environment for .NET 3.5
# and VS2008. Failing that, falls back to .NET 3.0/VS2005.
###############################################################################

$NETFXDIR = "$env:WINDIR\Microsoft.NET\Framework"
$FX20 = "$NETFXDIR\v2.0.50727"
$FX35 = "$NETFXDIR\v3.5"

function script:append-path {
   $env:PATH += ';' + $args
}
function script:append-lib {
   if ( test-path('Env:\LIB') ) {
      $env:LIB += ';' + $args
   } else {
      $env:LIB = $args
   }
}
function script:append-include {
   if ( test-path('Env:\INCLUDE') ) {
      $env:INCLUDE += ';' + $args
   } else {
      $env:INCLUDE = $args
   }
}
function script:get-vsdir([string] $version) {
   $regpath = "HKLM:SOFTWARE\Microsoft\VisualStudio\$version"
   if ( test-path($regpath) ) {
      $regKey = get-itemproperty $regpath
      return $regkey.InstallDir
   }
   return $null
}
function script:set-vsenv([string] $version) {
   $VSDIR = (get-vsdir $version)
   if ( $VSDIR -ne $null ) {
      append-path $VSDIR
      append-path "$VSDIR..\..\VC\bin"
      append-path "$VSDIR..\Tools"
      
      append-include "$VSDIR..\..\VC\include"
      append-lib "$VSDIR..\..\VC\lib"
      return $true
   }
   return $false
}
function script:get-psdkdir {
   $regpath = "HKLM:SOFTWARE\Microsoft\Microsoft SDKs\Windows\"
   if ( test-path($regpath) ) {
      $regKey = get-itemproperty $regpath
      return $regkey.CurrentInstallFolder
   }
   return $null
}
function script:set-psdkenv {
   $sdkdir = (get-psdkdir)
   if ( ($sdkdir -ne $null) -and (test-path $sdkdir) ) {
      append-path "$sdkdir\bin"
      if ( test-path "$sdkdir\include" ) {
         append-include "$sdkdir\include" 
      }
      if ( test-path "$sdkdir\lib" ) {
         append-lib "$sdkdir\lib"
      }
   }
}

set-psdkenv
# if .NET 3.5 is installed, default to that, otherwise use 2.0
if ( test-path($FX35) ) {
   append-path $FX35
}
append-path $FX20
if ( -not (set-vsenv "9.0") ) {
   [void] (set-vsenv "8.0")
}

Feel free to customize as you see fit :-).

PowerShell and the Windows SDK

Link. March 13, 2008. Comments [0]. Posted in: .NET | PowerShell

Nanda Lella's from the Windows SDK team is asking for feedback for the next version of the Platform SDK. Specifically, she asks whether a PowerShell-based build environment (as opposed to the current CMD-based one) would be a welcome addition to the SDK.

Personally, I think this is a great idea and hope it becomes a reality [1]. I'm sure that having it out of the box would make PowerShell a lot more popular amongst developers. Repeat after me: PowerShell is not just for sysadmins!.

Not having a build environment readily available in PowerShell kept me a long time from making it my default shell.

Because of this, I eventually sat down and wrote a custom profile script, like many others, to setup the build environment manually, loosely based on the original batch files in the Windows SDK and Visual Studio distributions. It's probably not perfect, but it does the trick and now I use PowerShell all the time.

Actually, I did make one significant modification to my script compared to the originals in the SDK: I keep machines with both .NET 3.5/VS2008 as well as .NET 3.0/VS2005, and wanted to keep a single profile script to avoid having to constantly modify my own environment.

My current profile script first looks and configures the environment for VS2008; failing that, it falls back to the .NET 2.0/3.0 and Visual Studio 2005 configuration. If anyone happens to be interested in it; ping me and I'll be happy to share it.

[1] Actually, there's no reason why this needs to be tied to a specific SDK version, but I imagine that's how it may/will happen. Frankly, it would be good enough for many people to have a pre-written version downloadable from somewhere easily searchable and accessible.

Syndicate

About

Tomas Restrepo is a software developer located in Colombia, South America. His interests include .NET, Connected Systems, PowerShell and lately dynamic programming languages. More...

tomasrestrepo @ twitter My Flickr photostream My saved links on delicious My Technorati Profile

email: tomas@winterdom.com
msn: tomasr@passport.com

View my profile on LinkedIn

MVP logo

Ads


Categories

Statistics

Total Posts: 1050
This Year: 1
This Month: 1
This Week: 0
Comments: 825

Archive

Other

Copyright © 2002-2008, Tomas Restrepo.

Powered by: newtelligence dasBlog 2.2.8279.16125

Sign In