FixEncoding on GitHub

Link. September 3, 2008. Comments [0]. Posted in: BizTalk

I just published the source code for my FixEncoding pipeline components for BizTalk Server 2006 to GitHub as well. FixEncoding provides two custom components that can help you resolve issues when resolving charset/encoding issues when receiving or sending messages.

I haven't touched FixEncoding in a while, but I still find it very helpful on a lot of projects. You can find the new repository here: http://github.com/tomasr/fixencoding/

Why GitHub?

Link. August 31, 2008. Comments [3]. Posted in: BizTalk | Development

Nick Heppleston, a fellow BizTalk blogger and user of my PipelineTesting library, left a comment on a recent post asking why I chose to put the library code on GitHub instead of CodePlex. I think it's a fair question, so let me provide some context.

As many of you are probably aware of it by now, there has been much talk lately about Distributed Version Control Systems (DVCS), as an alternative to the more traditional, centralized control systems that have been common in the past (and still are). DVCS has gained a lot of traction lately, particularly with Open Source projects because it really suits the already distributed nature of Open Source development.

For a long time I remained fairly skeptical of DVCS tools. To be honest, I just didn't get what the fuzz was about and centralized systems had worked just fine for me. I use CVS, Subversion and Team Foundation Server on a regular basis, and you can use all of them successfully with your projects. Obviously, each one has its strengths and issues, but they are all very usable tools.

However, during last year I've been working on a bunch of different projects where the workflow that suited my work style and my requirements best started to make using a centralized source control system a bit harder than it used to be before.

This made me realize that for some of the things I do, a centralized control system just doesn't cut it anymore. In other words, I crossed some invisible threshold where the SCCS stopped being an asset and started becoming a liability. Instead of having source control be a painless, frictionless process, it was becoming something I dread to deal with. And that's when I finally understood what DVCS was all about.

Why GIT?

git-logo So at that point I started to look into DVCS tools and playing a bit with them. There's a good discussion of some of the most important DVCS tools around, but in the end I finally settled for GIT, using the msysgit installation on my windows machines.

So far, I haven't really run into any really significant issues when running msysgit; the core stuff seems pretty solid, at least in my experience. I know there are some issues with git-svn in current builds, but I haven't used it yet so I can't comment on that.

I'm still very much a newbie at this, but I'm slowly getting the hang of it, and so far, I'm really liking it. Some aspects of git I really like are:

  • Speed: It's pretty fast in most operations, even with large source code files (like tool-generated ones).
  • Local branches: I love being able to create and switch between branches very easily and fast. Once you realize how easy it is to use them, you start taking advantage of branching a lot more than on regular, centralized version control systems.
  • Single work-tree: Not having to maintain N-copies at the same time of your work directory when you're dealing with N-branches is a real plus in many cases. Of course, you can choose to do so if you like, but it's no necessary, like with other tools.

Why PipelineTesting?

I've always shared the code of my PipelineTesting library through this website. However, I was only publishing snapshots of the code, and while that was fine given how few people use it, it was sometimes a drag. I really did want to share the code more broadly and make it easier to get to some of the changes I was working on even when I had not explicitly released a new official version of the library.

Last year I even commented a bit on this topic and asked for feedback about what the best place to host the code for some of my projects might be, but in the end I didn't make any decision about it.

Why not CodePlex?

codeplex CodePlex is a fine site for publishing and hosting your open source projects. I was skeptical about it at first, but it really took off and has a number of things going for it.

The greatest strength that CodePlex has is precisely that it's a Microsoft-technology oriented site. This means that it is a natural choice both when publishing projects that explicitly target the MS stack, and when you're looking for open source projects based on said technology.

I think that, overall, the CodePlex team has done a great job of keeping the site running and making sure it became a viable and valuable service to the community (and Microsoft itself).

The downside of CodePlex is, unfortunately, the technology it is based on: Team Foundation System. TFS is a fine, robust, centralized source control tool. But it also has a few things that manage to take the fun out of using it:

  • The support for working disconnected from the centralized server is just not good enough. Sure, it has improved a bit since the initial release, but it is far from a friction-less experience.
  • The TFS Client, integrated into Visual Studio. This is supposed to be an asset, but, honestly, I don't always want my source control integrated into my IDE. It can be good sometimes, but it can also be very painful.

Just to set the record straight: Yes, I am aware of the command line tools for driving TFS, and that's certainly an option. Yes, I'm also aware of SvnBridge, which I haven't used myself yet, and it is a really good option and addition to CodePlex, but means running yet another tool.

Why GitHub?

github The surest way to get proficient at something is to do it. I want to learn more about DVCS so that I can improve my workflow, and that means using my tool of choice.

For the time being, I'm choosing to stick with git for my personal projects (and some of my work). Given this choice, GitHub was a natural choice as to host my public stuff.

There are several aspects about GitHub that I like, but most of all, its that it is very simple overall, easy to get started with, and mostly stays out of my way. I also find the social aspects of it very intriguing, though naturally I'm not using those yet.

Of course, not everything is perfect in GitHub-land. Some will argue that it doesn't offer as many features as CodePlex in some aspects (like no forums) but that doesn't bother me at this point, as I don't really need that for now.

A bigger issue, however, could be that GitHub is not yet a very visible site among the .NET/BizTalk communities. Heck, I'm pretty sure PipelineTesting is the only BizTalk-related project on it :-). I think that anyone looking for my library is probably going to find it through this weblog first, so I'm not that worried about it, and the BizTalk community itself isn't all that large (it has grown enormously, but it's still small by comparison).

What's next?

I plan to continue working on PipelineTesting and I have a few features in mind for the next release. If anyone wants to contribute or has suggestions/bugs, please let me know about it!

I will continue to offer a copy of the library for download that includes a snapshot of the source code and a pre-compiled copy of the library, like I've been doing so far. People shouldn't have to install git just to get a copy of the library and use it, unless they need something in the code that's not yet in an "official" build. Of course, I'm a nice guy, so if you really really need it, just ask :-).

I also plan to start taking advantage of some GitHub features. In particular, I want to migrate some of the "documentation" that I've written over time as blog posts to a more appropriate format that's easier to maintain and to use. For this, I want to put the GitHub Wiki to use and also add a proper readme file to make it easier to get started with the library.

PipelineTesting: XML Assembler and E_FAIL

Link. August 28, 2008. Comments [0]. Posted in: BizTalk

Fellow BizTalk developer Bram Veldhoen was kind enough to send me some suggestions for a future version of my PipelineTesting library, as well as with a question that could point to a potential bug in the library.

The problem basically revolves around consuming a stream returned by the XML Assembler component in a send pipeline when testing under a library. What Bram noticed was that executing the pipeline would seem to work, but trying to read the body part stream of the output message would fail with a ComException with error code 0x8004005 (E_FAIL).

I was fairly confident this should've been working, based on my own use of the library, but I sat down to test it just to make sure. What I discovered was that indeed this can happen if the pipeline context for the test is not aware of the schema for the message being processed by the pipeline.

I added a new test to the library to make sure this was working correctly:

[Test]
public void CanReadXmlAssemblerStream() {
   SendPipelineWrapper pipeline = Pipelines.Xml.Send()
      .WithSpec<Schema3_FF>();
   IBaseMessage input = MessageHelper.CreateFromStream(
      DocLoader.LoadStream("CSV_XML_SendInput.xml")
      );
   IBaseMessage output = pipeline.Execute(input);
   Assert.IsNotNull(output);
   // doc should load fine
   XmlDocument doc = new XmlDocument();
   doc.Load(output.BodyPart.Data);
   XmlNodeList fields = doc.SelectNodes("//*[local-name()='Field3']");
   Assert.Greater(fields.Count, 0);
}

There are a few things to keep in mind about this issue:

  1. If you're using the XML Assembler, make sure your pipeline context has all the necessary schemas. There are three ways you can do this, depending on how you are creating the pipeline:
    1. If you're using the original raw API, you can use the AddDocSpec() method of the SendPipelineWrapper class.
    2. If you're using the new, simple API, you can add the schema through the WithSpec() method, which is what the test above does.
    3. If you're using the simple API, but you're dynamically creating the pipeline, you can just add the schemas directly in the XmlAssembler configuration using the WithDocumentSpec() and WithEnvelopeSpec() methods (see the XmlAssembler.cs file for details).
  2. Make sure you're testing the right thing. Sometimes, it's enough to make sure that the pipeline can be executed successfully. Remember, however, that pipelines are streaming beasts, so a lot of the work will oftentimes happen just when you read the resulting stream, thus causing the processing to happen.
    This is exactly the scenario we're seeing here today.

The second point is really important, but, for some reason, I never put much emphasis in it when creating the library and when talking about it. I think this is important enough to warrant doing something about it.

For starters, I've committed a few changes to the PipelineTesting repository. Besides adding the test above, I've also added a few ConsumeStream() and ReadString() helper methods to the MessageHelper class to make it easier to validate your components work by simply reading the entire stream from a message. I'll add a few other helper methods for this later on, but the idea is to make it so that you can write less code for your tests.

Back to the Past

Link. August 7, 2008. Comments [0]. Posted in: Architecture | BizTalk

Every so often I see people asking in the newsgroups how to solve certain challenges they encounter while working on their BizTalk applications. One common question revolves around being able to go "back to the past" when an error happens during processing of a message.

This isn't a bad question at all, and usually revolves around how to simulate the behavior of atomic transactions in an environment where transactions can be a lot more complex and not always as natural.

The question usually goes like this: "I'm receiving a message in BizTalk, which is triggering an orchestration instance. The orchestration does this and that, and if it any of those things fail, I want to put the message back where I got it from".

This-That-There

The question might seem simple, but it's not always necessarily so. In fact, sometimes you have to stop a moment and ask yourself whether this really makes sense. There are several aspects you need to consider:

  1. Handling the case where "this" causes an error is probably not a big deal. Handling the case where "this" succeeded but "that" failed, however, might not be that simple. Not all actions your orchestration might do can be undone.
  2. Most of the time you'll find that both actions can't be done as a single unit in a single atomic transaction. Fortunately, BizTalk provides very good support for long-running transactions and compensation which can help quite a bit.

    Unfortunately, long-running transactions and compensation models are often misunderstood (cue in the inevitable "How long does a transaction have to last to be a long-running transaction?" jokes/questions).

    Here are a few articles that do a great job of describing the BizTalk Transaction features and how to use them effectively:
  3. The sentence "put the message back where I got it from" can be either a very good thing, or a very problematic thing. It basically relates to leaving stuff as you found it; in particular, leaving the message back into its origin (thus relating to the transactional concept of "nothing happened here, move along")so that you can try processing it again later on and hopefully it will succeed at that time.

The problem with number 3 is that it (a) isn't always possible, and (b) it isn't always a good idea.

It might not be possible to put the message back where you got it if someone was pushing the message to you instead of you pulling it from somewhere. If you had a SOAP/HTTP WebService exposed that received a message from someone else, then you probably can't put the message back where you got it from!

On the other hand, this is a very common model for queued messaging systems: If you run into an error processing the message, you put it back into the queue and try again later. And this works great many times and can simplify error handling a great deal.

The point where this becomes a problem is when you rely on this as your only error handling mechanism. If you blindly send the message back to its origin to retry processing for any and all errors and a message comes in that always fails, you've got yourself a poison message!

toxic I've already talked about Poison Messages in the past, so I won't comment much more on them. But there are other things you can keep in mind to improve the "back to the past" error handling technique, particularly if you don't care about message processing:

  1. If you can identify and classify the source/cause of the errors, you can make your orchestration smarter about how to handle them. For example:
    • Can you distinguish transient error conditions? For example, a timeout connecting to the database might be a temporary condition because of a network fluke or a server being restarted. Sometimes retrying the operation after a short while is enough to deal with this situation effectively.
    • Can you distinguish errors that might require manual intervention to fix? Example: Validating an operation fails because some configuration data is missing. This is a case where you want to be proactive and raise an appropriate alert so that someone can get in there and fix the issue. Extra points if you can tell apart conditions that require intervention from a business users and those that require it from a systems administrator.

      Notice, however, that in this case putting the message back at the start right after creating the notification is not the right thing to do. People don't react that fast. You need to set the message aside until such time as the corrective measure has been taken and it is safe to try processing it again.
  2. Can you control when the retry might happen? Can you throttle it if necessary? If the answer is no, then you might want to be very careful about using this technique. You could easily increase the system load substantially if lots of messages fail in a short time and you try reprocessing them in a tight loop.
  3. Be mindful of adapters that provide no ordering semantics. For example, if your original location used the FILE adapter and you put the message back in the original folder, it will likely get picked up very soon again for processing; which can quickly get you back to step 2.

    At least with an adapter like MSMQ you can push the message to the end of the queue, which might buy you some time.
  4. Even if you take 1, 2 and 3 into account, you still need to provide a way to deal with poison messages. Keep in mind that what started as a transient error condition can suddenly escalate to a full-blown problem you can't do nothing about, like when that temporary network fluke turns into a days-long outage after some idiot digging a whole outside snaps your network fiber cable in two.

    In fact, sometimes you might need to go so far as to completely shut down processing. Sometimes being able to detect that some things that should be working keep failing after an extended period of time and alerting about it can help get things sorted out before they spiral out of control.

These are just some ideas that might help make your system more reliable and more manageable. Some of them do cost money; that is, you have to invest time and development/testing efforts in getting them done, and that's where you're going to have to evaluate what makes sense and what not.

CreateXmlInstance() with Multi-Root Schemas

Link. July 16, 2008. Comments [1]. Posted in: BizTalk

The DocumentSpec class in the Microsoft.BizTalk.Component.Interop namespace of the Microsoft.BizTalk.Pipelines assembly is commonly used in custom pipeline components (particularly assemblers in disassemblers) to represent a compiled BizTalk schema (a document specification).

This class has one interesting method: CreateXmlInstance(), which can generate an sample instance XML based on the associated schema [1]. Recently I saw a question in the BizTalkGurus forum about how to use this functionality if you had a schema definition with multiple potential root elements.

Turns out its pretty easy, once you understand how DocumentSpec works and how Schemas are compiled into BizTalk assemblies.

The constructor for the DocumentSpec class takes two arguments: The docSpecName (schema name) and the name of the assembly it is defined in.

The clue to support multi-root schemas is that the docSpecName is, in reality, the namespace + type name of the class generated by the compiler when you compile the Schema into the BizTalk assembly. Each schema becomes one class in the generated code.

If the schema has a single root, that's the end of it; All you need to know is that docSpecName == Namespace.ClassName. If the schema has multiple roots, however, each root becomes a nested class within the schema class.

The way to select which root element to use for the sample instance generation, then, is to provide the type name of the root element nested class as to the DocumentSpec constructor instead of the name of the schema class. In other words, in this case docSpecName == Namespace.SchemaClass+RootClass.

[1] One annoyance of this method is that it takes a TextWriter argument but doesn't actually write the generated XML into it; instead it returns a Stream you need to read!

Syndicate

About

Tomas Restrepo is a software developer located in Colombia, South America. His interests include .NET, Connected Systems, PowerShell and lately dynamic programming languages. More...

tomasrestrepo @ twitter My Flickr photostream My saved links on delicious My Technorati Profile

email: tomas@winterdom.com
msn: tomasr@passport.com

View my profile on LinkedIn

MVP logo

Ads


Categories

Statistics

Total Posts: 1035
This Year: 105
This Month: 4
This Week: 2
Comments: 802

Archive

Other

Copyright © 2002-2008, Tomas Restrepo.

Powered by: newtelligence dasBlog 2.1.8139.823

Sign In